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Abstract.  This  article  discusses  univariate  density  estimation  in  situations  when  the  sample  (hard 
information)  is  supplemented  by  “soft”  information  about  the  random  phenomenon.  These  situations 
arise  broadly  in  operations  research  and  management  science  where  practical  and  computational  rea¬ 
sons  severely  limit  the  sample  size,  but  problem  structure  and  past  experiences  could  be  brought  in.  In 
particular,  density  estimation  is  needed  for  generation  of  input  densities  to  simulation  and  stochastic 
optimization  models,  in  analysis  of  simulation  output,  and  when  instantiating  probability  models.  We 
adopt  a  constrained  maximum  likelihood  estimator  that  incorporates  any,  possibly  random,  soft  in¬ 
formation  through  an  arbitrary  collection  of  constraints.  We  illustrate  the  breadth  of  possibilities  by 
discussing  soft  information  about  shape,  support,  continuity,  smoothness,  slope,  location  of  modes, 
symmetry,  density  values,  neighborhood  of  known  density,  moments,  and  distribution  functions.  The 
maximization  takes  place  over  spaces  of  extended  real-valued  semicontinuous  functions  and  therefore 
allows  us  to  consider  essentially  any  conceivable  density  as  well  as  convenient  exponential  transforma¬ 
tions.  The  infinite  dimensionality  of  the  optimization  problem  is  overcome  by  approximating  splines 
tailored  to  these  spaces.  To  facilitate  the  treatment  of  small  samples,  the  construction  of  these  splines 
is  decoupled  from  the  sample.  We  discuss  existence  and  uniqueness  of  the  estimator,  examine  consis¬ 
tency  under  increasing  hard  and  soft  information,  and  give  rates  of  convergence.  Numerical  examples 
illustrate  the  value  of  soft  information,  the  ability  to  generate  a  family  of  diverse  densities,  and  the 
effect  of  misspecification  of  soft  information. 
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1  Introduction 

It  is  recognized  that  statistical  estimates  can  be  improved  greatly  by  including  contextual  information 
to  supplement  the  information  derived  from  data.  We  refer  to  the  contextual  information  as  soft 
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information,  in  contrast  to  hard  information  derived  from  observations  (data).  In  this  article,  we 
consider  univariate  probability  density  estimation  exploiting,  in  concert,  hard  and  soft  information. 
Although  our  development,  theoretical  and  numerical,  makes  no  distinction  based  on  sample  size,  not 
surprisingly,  it  is  when  the  sample  size  is  small  that  this  fusion  of  hard  and  soft  information  plays 
a  crucial  role  in  producing  quality  estimates.  We  limit  the  scope  to  densities  of  random  variables 
with  distributions  that  are  absolutely  continuous  with  respect  to  the  Lebesgue  measure  on  a  bounded 
interval. 

The  need  for  estimating  probability  density  functions  is  prevalent  across  operations  research  and 
management  science.  For  example,  an  essential  step  in  simulation  analysis  and  stochastic  optimization 
is  the  generation  of  probability  densities  for  input  random  variables;  see  for  example  [11,  27,  5].  Density 
estimation  is  also  needed  when  populating  probability  models  and  when  analyzing  simulation  output 
beyond  their  typical  first  and  second  moments.  In  all  these  situations,  however,  the  sample  available 
is  typically  extremely  small  due  to  practical  and  computational  limitations.  One  is  usually  forced 
to  restrict  the  attention  to  parametric  families  of  densities.  In  this  paper,  we  provide  the  theoretical 
foundations  of  an  alternative  approach  that  brings  in  soft  information  about  problem  structure  and  past 
experiences  to  obtain  reasonable  nonparametric  density  estimates  even  for  very  small  sample  sizes.  The 
approach  has  been  successfully  applied  in  the  context  of  simulation  output  analysis  [65],  uncertainty 
quantification  [58],  as  well  as  estimation  of  errors  in  forecasts  for  commodity  prices  [74]  and  electricity 
demand  [26];  see  also  [55]. 

A  natural  and  widely  studied  approach  to  density  estimation  is  to  adopt  an  M-estimator  with  addi¬ 
tional  constraints  to  account  for  soft  information.  We  continue  this  tradition  by  defining  an  estimator 
that  is  an  optimal  solution  of  a  constrained  maximum  likelihood  problem.  An  appealing  property  of 
such  estimators  is  that  for  any  sample  size,  an  estimate  is  the  best  possible  within  the  class  of  allowable 
functions  according  to  the  given  criterion  (likelihood). 

We  trace  the  consideration  of  soft  information  in  terms  of  shape  constraints  at  least  back  to  [31,  32]. 
More  recent  studies  of  univariate  log-concave  densities  include  [35,  37,  71,  48,  23,  2],  with  computational 
comparisons  in  [60];  see  also  the  review  [72]  and,  in  the  case  of  multivariate  densities,  e.g.,  [12,  13]. 
Convexity  and  monotonicity  restrictions  are  examined  in  [34,  46]  and  monotonicity,  monotonicity  and 
convexity,  U-shape,  as  well  as  unimodality  with  known  mode  are  studied  in  [47,  46].  Unimodal  functions 
are  also  covered  in  [54,  36],  with  the  former  covering  U-shape  as  well.  Monotone,  convex,  and  log-concave 
densities  are  dealt  with  in  [6].  Studies  of  k- monotone  densities  include  [3,  28,  4].  Densities  given  as 
monotone  transformations  of  convex  functions  are  examined  in  [61].  Convex  formulation  of  a  collection 
of  shape  restrictions  is  discussed  in  [49,  50].  We  refer  to  the  recent  dissertation  [22]  and  the  discussion  in 
[13]  for  a  more  comprehensive  review  and  to  [44]  for  the  related  context  of  shape-restricted  regression. 

Although  these  studies  address  important  cases,  there  is  no  overarching  framework  that  allows  for  a 
comprehensive  description  of  soft  information  formulated  by  a  large  variety  of  constraints.  Initial  work 
in  this  direction  is  found  in  [73],  which  deals  with  parametric  nonlinear  least-squares  regression  subject 
to  a  finite  number  of  smooth  equality  and  inequality  constraints.  That  paper  examines  the  asymptotics 
of  the  least-squares  estimator  using  the  convergence  theory  of  constrained  optimization,  specifically  epi- 
convergence.  In  the  context  of  constrained  maximum  likelihood  estimation,  [21]  establishes  consistency 
of  an  estimator  through  a  functional  law  of  large  numbers  and  epi-convergence.  The  latter  work  is  an 
immediate  forerunner  to  the  present  paper. 

Having  adopted  a  nonparametric  constrained  maximum  likelihood  framework,  we  face  technical 
challenges  along  two  axes.  First,  one  needs  to  deal  with  constrained  optimization  problems.  Of  course. 
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in  principle,  constraints  can  be  handled  through  penalties  and  regularizations;  see  for  example  [30,  16, 
43,  39,  64,  67]  and  more  recently  [25,  69,  40,  41,  45,  42,  7].  However,  the  equivalence  and  interpretations 
of  such  reformulations  depends  on  the  successful  selection  of  multipliers  and  penalty  parameters  which 
is  far  from  trivial  in  practice,  especially  in  the  case  of  multiple  constraints.  In  fact,  poor  selection 
of  these  multipliers  and  parameters  may  cause  computational  challenges  due  to  ill-conditioning  of  the 
resulting  optimization  problem  as  well  as  significant  deterioration  of  the  quality  of  the  resulting  density 
estimate.  Moreover,  it  becomes  unclear  in  what  sense,  if  any,  an  estimator  is  “best”  when  an  otherwise 
natural  criterion  such  as  likelihood  is  mixed  with  nonzero  penalty  terms;  see  [21]  for  further  discussion. 
It  is  also  possible  to  devise  specialized  algorithms  such  as  the  iterative  convex  minorant  algorithm 
[35,  37]  to  account  for  certain  constraints  or  modify  “unconstrained”  estimators  such  as  those  based 
on  kernels;  [36]  handles  unimodality,  [6]  considers  monotonicity,  convexity,  and  log-concavity,  and  [15] 
aims  to  reduce  the  number  of  modes;  see  [75,  53]  for  computational  tools.  Again,  it  is  unclear  in  what 
sense,  if  any,  such  estimates  are  “best”  in  the  case  of  finite  samples.  Moreover,  it  is  challenging  to 
generalize  these  approaches  to  handle  other  types  of  soft  information.  We  direct  the  reader  to  [68]  and 
references  therein  for  treatments  of  kernel  estimators  including  a  discussion  of  optimality. 

The  second  challenge  with  a  nonparametric  constrained  maximum  likelihood  framework  is  the 
infinite-dimensionality  of  the  resulting  optimization  problem.  Naturally,  there  is  a  computational  need 
to  consider  families  of  approximating  densities  characterized  by  a  finite  number  of  parameters.  The 
method  of  sieves  [33,  29,  10]  provides  a  framework  for  constructing,  typically,  finite-dimensional  ap¬ 
proximating  subsets  that  are  gradually  refined  as  the  sample  size  grows  and  that  in  the  limit  is  dense 
in  a  function  space  of  interest.  However,  difficulties  arise  from  three  directions.  First,  with  our  focus 
on  small  sample  sizes,  the  linkage  between  sample  size  and  sieves  becomes  untenable.  Second,  in  order 
to  allow  for  the  possibility  of  discontinuous  densities  and  exponential  transformations,  we  choose  as  un¬ 
derlying  space  the  extended  real-valued  lower  or  upper  semicontinous  functions,  but  neither  is  a  linear 
space.  Consequently,  the  mathematically  inbred  tendency  to  obtain  a  finite-dimensional  approximation 
by  relying  on  a  well-chosen  finite  basis  is  problematic;  see  for  example  [18,  45]  for  such  an  approach 
based  on  splines.  Third,  despite  progress  towards  handling  shape  restrictions  on  sieves  (see  for  example 
[20,  19,  17,  49,  50]),  there  is  no  straightforward  way  of  handling  a  comprehensive  set  of  soft  information. 

In  this  paper,  as  in  [21],  we  consider  an  arbitrarily  constrained  maximum  likelihood  estimator  for 
densities.  We  appear  to  be  the  first  to  consider  such  general  constraints  (soft  information)  in  the 
context  of  nonparametric  density  estimation.  The  soft  information  might  even  be  random,  i.e.,  the  soft 
information  may  not  be  known  a  priori  but  is  realized  with  the  sample.  We  give  concrete  formulations 
of  the  constrained  maximum  likelihood  problem  in  the  case  of  soft  information  about  support  bounds, 
semicontinuity,  continuity,  smoothness,  slope  information  and  related  quantities,  monotonicity,  log- 
concavity,  unimodality,  location  of  modes,  symmetry,  bounds  on  density  values,  neighborhood  of  known 
density,  bounds  on  moments,  and  bounds  on  cumulative  distribution  functions.  We  allow  for  any 
combination  of  these,  and  essentially  any  other  constraint  too. 

We  overcome  the  technical  difficulty  caused  by  constraints  through  the  theory  of  constrained  op¬ 
timization,  specifically  epi-convergence,  and  therefore  avoid  tuning  parameters  related  to  penalties 
and  regularization.  With  the  exception  of  the  preliminary  work  [21],  this  paper  is  the  first  to  utilize 
epi-convergence  to  analyze  constrained  density  estimators.  We  overcome  the  difficulty  of  infinite  di¬ 
mensionality  through  the  use  of  a  new  class  of  splines,  epi-splines  [57],  which  are  highly  flexible,  allow 
for  discontinuities,  and  enable  convenient  exponential  transformations.  Here,  for  the  first  time,  the 
theoretical  foundations  for  using  epi-splines  in  density  estimation  are  laid  out.  In  contrast  to  sieves. 
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epi-splines  can  be  constructed  independently  of  the  sample  and  therefore  handles  small  sample  sizes 
naturally.  The  precursor  [21]  relies  on  a  finite  approximation  of  C?  by  Fourier  coefficients.  In  this  paper, 
we  consider  the  spaces  of  extended  real-valued  semicontinuous  functions,  exponential  transformations, 
and  epi-spline  approximations. 

The  reliance  on  epi-convergence  and  epi-splines  allow  us  to  view  the  constrained  maximum  likeli¬ 
hood  problem  as  an  approximation  of  a  limiting  optimization  problem  involving  the  actual  probability 
density,  correct  soft  information,  and  the  full  space  of  semicontinuous  functions;  we  reference  [52]  for 
a  related  study  in  the  context  of  regression  utilizing  graphical  convergence.  Consequently,  we  not  only 
approximate  a  certain  function  space  or  deal  with  hnite  sample  size,  but  study  the  approximation  of  the 
whole  estimation  process  as  formulated  by  the  limiting  optimization  problem.  The  approach  facilitates 
the  examination  of  families  of  estimators  such  as  those  that  are  near-optimal  solutions  of  a  constrained 
maximum  likelihood  problem. 

Our  primary  motivation  is  to  obtain  reasonable  estimates  in  situations  with  little  hard  information 
and  we  provide  a  consistency  result  as  soft  information  is  refined,  quantify  finite  sample  errors,  and 
present  a  small  computational  study  to  motivate  the  estimator  in  that  regard.  Still,  we  also  establish 
consistency  and  quantify  asymptotic  rates,  as  hard  information  is  refined,  under  general  constraints. 

We  focus  exclusively  on  univariate  densities  that  vanish  beyond  a  compact  interval  of  the  real 
line.  Although  most  of  the  results  extend  to  the  unbounded  case  and  higher  dimensions,  technical 
issues  will  then  become  prominent  and  obscure  the  treatment  of  arbitrary  random  constraints  and  the 
supporting  epi-spline  approximations.  Moreover,  with  a  small  sample,  tail  behavior  can  only  come  in 
via  soft  information,  which  is  easily  handled  by  our  framework  but  omitted  here  for  simplicity;  a  few 
experimental  results  can  be  found  in  [66]. 

The  paper  proceeds  in  §2  by  dehning  the  constrained  maximum  likelihood  estimator,  summarizing 
the  underlying  approximation  theory,  which  is  based  on  [57],  and  discussing  existence  and  uniqueness. 
Section  3  exemplify  the  breadth  of  soft  information  that  can  be  included  and  §4  provides  consistency, 
asymptotics,  and  finite  sample  error  results.  A  small  collection  of  numerical  examples  are  featured  in 
§5.  The  paper  is  summarized  in  §6. 

2  Exponential  Epi-Spline  Estimator 

This  section  formulates  a  constrained  maximum  likelihood  problem  and  presents  a  finite-dimensional 
approximation.  We  discuss  existence,  uniqueness,  and  computations.  The  section  also  includes  the 
prerequisite  approximation  results. 

2.1  Constrained  Likelihood  Maximization  and  Epi-Spline  Approximations 

We  consider  a  random  variable  A,  with  — oo  <l<X<u<oo  a.s.  and  a  distribution  that  is  absolutely 
continuous  with  respect  to  the  Lebesgue  measure,  an  iid  sample  . . . ,  A”,  and  a  possibly  random 

set  that  accounts  for  soft  information  about  the  density  of  A;  see  §3  and  §5  for  concrete  examples. 
Realizations  of  are  subsets  of  allowable  functions  on  [l,u].  The  randomly  constrained  maximum 
likelihood  problem  takes  the  form: 


(P”)  :  /”■  G  argmaxj^e  such  that  /  G  P"'  C  A,  /  e  =  1, 


2  =  1 
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where  “argmax”  denotes  the  set  of  optimal  solutions  and  F  is  the  space  of  extended  real-valued  lower 
semicontinuous  (Isc)  functions  /  :  Ft  =  M  U  {—00,00}  excluding  /  =  00,  or  alternatively 

the  space  of  extended  real- valued  upper  semicontinuous  (use)  functions  f  :  [I,  u]  ^  Ft  now  excluding 
/  =  —  00.  The  density  estimator  then  takes  the  form 

e~^  with  /”  a  solution  of  (P^). 

These  spaces  of  functions  under  considerations  are  large  enough  to  capture  essentially  all  densities 
on  [l,u]  including,  of  course,  those  with  discontinuities.  Moreover,  the  ability  to  handle  f{x)  =  00 
ensures  that  exp(— /(x))  =  0  and,  therefore,  the  exponential  transformation  in  {P"")  does  not  eliminate 
the  possibility  of  vanishing  densities  at  points  in  [l,u\.  If  f{x)  =  —00,  then  exp(— /(x))  =  00,  which 
obviously  can  at  most  occur  for  x  in  a  set  of  (Lebesgue)  measure  zero  if  exp(— /)  is  a  density.  The 
exponential  transformation  (see  [30,  16]  for  early  use  of  such  transformations  and  [61]  for  a  broader 
treatment)  automatically  ensures  that  exp(— /(•))  is  nonnegative  and  explicit  constraints  for  that  pur¬ 
pose  are  redundant.  In  addition,  some  types  of  soft  information  are  more  easily  formulated  for  /  than 
for  h  =  exp(— /(•));  see  examples  in  §3.  Since  we  approximate  Isc  (use)  functions  by  the  piecewise 
polynomial  epi-splines,  to  be  discussed  shortly,  further  motivation  for  the  exponential  transformation 
is  provided  by  the  fact  that  many  of  the  common  densities  are  indeed  exponential  transformations  of 
polynomials.  We  observe  that  the  Isc  and  use  functions  are  measurable  and  consequently  the  integral 
in  (P^)  is  well-defined,  but  possibly  infinite. 

It  is  clear  that  a  solution  /”  of  (P”)  generates  a  density  exp(— /"■(•))  that,  regardless  of  the  sample 
size,  possesses  the  properties  embedded  in  F"",  which  presumably  are  the  properties  of  the  actual  density 
(see  Theorems  4.2  and  4.4  and  Section  5.4  for  a  discussion  of  misspecification) .  Moreover,  it  will  be  a 
best  possible  density  in  terms  of  the  maximum  likelihood  criterion  and  the  set  of  allowable  densities. 

In  view  of  the  above  formulation  and  discussion,  we  are  unable  to  build  on  the  extensive  literature 
on  sieves  and  follow  a  different  path.  We  instead  introduce  a  new  class  of  functions  called  exponential 
epi-splines  from  which  we  can  construct  approximations  independently  of  the  sample.  They  allow  us 
to  substitute  for  the  infinite-dimensional  (F”),  a  finite-dimensional  problem,  guaranteed  to  generate  a 
solution  that  approximates,  to  any  desired  level  of  accuracy,  a  solution  of  (F*^). 

We  start  by  defining  the  central  building  block  of  our  approximation  framework;  see  [57]  for  details. 
A  basie  epi-spline  is  a  function  given  in  terms  of  an  order  p  £  Wq  :=  {0}  U  W,  where  JN  :=  {1,  2, ...} 
and  a  mesh  m  :=  with  fc  =  1,  2, ...,  N,  that  partitions  its  domain  [mo,  uiAr]  in  N 

open  subintervals,  where  on  each  subinterval  the  basic  epi-spline  is  a  polynomial  of  degree  p. 

Our  focus  on  small  samples  and  the  use  of  highly  flexible  candidate  densities  in  (F”)  and  its  epi- 
spline-based  approximations  can  easily  lead  to  overfitting.  This  might  give  the  impression  that  the  mesh 
m  will  become  an  important  tuning  parameter.  However,  since  tuning  the  mesh  might  be  challenging 
and  easily  could  have  become  the  subject  of  arbitrary  decisions,  we  take  another  approach.  We  recall 
that  (F”)  is  the  actual  problem  of  interest  and  the  estimator  is  exp(— /”),  with  /”  being  one  of  its 
solutions.  Since  such  a  solution  is  not  directly  available,  our  effort  is  directed  towards  obtaining  an 
approximation  through  a  “discretization”  of  the  space  F.  The  mesh  should  therefore  be  selected  fine 
enough  to  allow  epi-splines  to  adequately  approximate  the  underlying  space  F  of  Isc  (use)  functions,  or 
possibly  subsets  of  continuous  or  continuously  differentiable  functions,  if  such  restrictions  are  warranted. 
With  this  perspective,  it  becomes  F”  that  needs  to  be  appropriately  defined  to  ensure  that  (F”)  has 
reasonable  solutions  that  avoid  overfitting,  among  other  things.  Since  we  allow  for  arbitrary  constraints, 
there  are  usually  several  ways  soft  information  can  be  brought  in  to  ensure  reasonable  solutions,  which 
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leads  to  flexibility  for  the  analyst;  see  §5  for  examples.  In  most  of  the  paper,  we  therefore  assume  that 
the  mesh  m  is  fixed  and  sufficiently  fine. 

Obviously,  basic  epi-splines  are  structurally  related  to  polynomial  splines,  widely  used  in  engineering 
and  statistical  applications  [70,  18,  45],  as  both  are  piecewise  polynomial  functions.  However,  basic  epi- 
splines  are  more  flexible,  with  continuity  not  required  at  mesh  points,  and  they  can  approximate  any 
extended  real- valued  semicontinuous  function  (see  Theorem  2.4  below).  The  formal  definition  is  stated 
next. 

2.1  Definition  (basic  epi-spline  and  associated  mesh).  A  (basic)  epi-spline  s  :  [mo,misf]  C  M  ^  M 

with  mesh  m  =  and  mesh-grade  jmj  :=  maxi<fc<Ar(mfc  —  mk-i)  is  of  order  p  G  Wq  if  on  each 

subinterval  (mfc_i,  m^)  for  k  =  1, . . . ,  N ,  s  is  polynomial  of  degree  p. 

The  family  of  all  such  epi-splines  is  denoted  by  e-spP(m). 

Exponential  transformations  of  epi-splines  result  in  exponential  epi-splines: 

2.2  Definition  (basic  exponential  epi-spline).  The  family  of  (basic)  exponential  epi-splines  of  order 

p  G  JNq  with  mesh  m  =  denoted  by  x-spF(m),  consists  of  functions  h  :  [mo,mN]  R  of  the 

form  h  =  e“®,  where  s  G  e-spF(m). 

Since  this  paper  deals  with  basic  epi-splines  and  exponential  epi-splines  exclusively,  we  systematically 
drop  “basic”  from  now  on.  The  approximation  of  relying  on  (exponential)  epi-splines  then  takes 

the  following  form  (after  the  customary  switch  to  log-likelihood): 

1  /■"liV 

{Pp,m)  ■  'S"  G  argmin  —  ^  s(X*)  such  that  s  G  C  e-spF(m),  /  =  1, 

^  i—i  J  mo 

where  S’""  is  the  formulation  and  possibly  approximation  of  soft  information  in  terms  of  epi-splines.  In 
this  paper,  we  therefore  examine 

h"'  :=  with  s"'  a  solution  of  (H^^), 

which  is  our  approximation  of  exp(— /"■).  We  refer  to  as  the  exponential  epi-spline  estimator. 
Throughout  the  paper  we  make  the  assumption  that  the  support  [I,  u]  of  the  true  density  is  a  subset  of 
[mo,mAr]. 

It  is  clear  from  the  definition  that  every  s  G  e-spF(m),  with  mesh  m  =  is  uniquely  defined 

by  {p  +  2)A^  -|-  1  parameters^  Consequently,  (Pp^rn)  is  equivalent  to  a  hnite-dimensional  optimization 
problem,  usualy  easily  solved  by  standard  algorithms.  The  next  subsection  provides  the  justification 
for  approximating  (H”)  by  {Pp^rn)-  We  note  that  this  approximation  is  carried  out  for  computational 
reasons,  as  a  means  to  overcome  the  infinite  dimensionality  of  {P^)-  The  hard  and  soft  information  are 
considered  fixed. 

^There  are  N  subintervals  {mk-i,mk)  each  with  a  polynomial  of  degree  p.  This  gives  A^(p-|- 1)  parameters.  In  addition, 
there  are  -|-  1  mesh  points  on  which  an  epi-spline  is  freely  defined  (unless  continuity  is  imposed)  as  motivated  in  §2.3, 
which  leads  to  an  additional  -f  1  parameters.  We  note  that  the  mesh  m  is  hxed. 
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Figure  1:  Examples  of  dp{f,g)  for  epi/  and  epig  with  different  overlaps 

2.2  Approximation  Results 

Approximations  of  extended  real- valued  semicontinuous  functions  by  epi-splines  rely  on  the  refinement 
of  the  mesh  as  made  precise  in  the  next  definition.  This  subsection  is  based  on  [57]. 

2.3  Definition  (infinite  refinement).  Given  the  interval  [l,u],  one  refers  to  a  sequence  of  meshes 

with  my  =  {Z  =  . . .  ,m'^^  =  uj,  as  an  infinite  refinement  if  their  mesh-grade 

\m^\  — >■  0. 

It  is  clear  from  classical  spline  theory  that  continuous  functions  can  be  approximated  by  polynomial 
splines.  We  need  to  go  beyond  continuous  functions  to  extended  real-valued  semicontinuous  functions. 
We  rely  on  the  epi-topology  and  hypo-topology  (sometimes  called  the  Attouch-Wets  topologies),  which 
are  reviewed  here  for  completeness;  see  [56,  §7.1]  for  details.  For  any  I  <  u  €  Ft,  we  denote  by 
Isc-fcns  the  set  of  all  Isc  functions  /  :  [Z,  a]  — )■  J?  excluding  /  =  oo.  For  any  two  functions,  / 

and  g,  in  this  space,  the  epi-distance  d,  is  defined  by  d{f,g)  :=  dp(f,  g)e~f'dp,  where  dp{f,g)  := 
max||,j||<p  \d{x,  epi  /)  —  d{x,  epi  g)  \  and  d{x,  S)  :=  inf^g^  ||x  —  y||  for  S  C  with  ||2;||  :=  (^^  and 

epi  /  :=  {x  =  (x,a)  G  IB?'  \  f{x)  <  a}  being  the  epigraph  of  /  and  similarly  for  epig;  see  Figure  1  for 
an  illustration.  When  the  metric  is  defined  in  terms  of  the  epi-distance,  it  generates  the  epi-topology  on 
Isc- fens ( [Z, u]):  (Isc- fens ( [Z, «]), d)  is  a  Polish  (complete  separable  metric)  space  [56,  Theorem  7.58],  [1, 
§5].  A  sequence  of  functions  in  Isc-fcns  ([Z,ri])  epi-converge  to  /  if  their  epigraphs  set-converge,  i.e., 
in  the  sense  of  taking  Painleve-Kuratowski  limits  [56,  §7.B],  which  by  [56,  Theorem  7.58]  takes  place  if 
and  only  if  d(/^,  /)  — 0. 

When  dealing  with  use  functions,  use- fens  ([Z,  u])  ,  now  excluding  the  function  =  —  oo,  after  observing 
that  hypograph  of  a  function  /,  hypo/  :=  {(x,a)  |  f{x)  >  a}  is  just  a  mirror  image  of  the  epigraph 
of  — /,  one  can  mimic  the  definitions  and  constructions  described  for  Isc  functions  to  set  up  the  hypo- 
distanee  dhypo(/)9)  :=  d{—f,  —g),  between  any  two  functions  /  and  g  and  generate  the  hypo-topology 
which  again  makes  (usc-fcns([Z,  tt]),  dhypo)  a  Polish  space.  A  sequence  of  functions  /^  hypo-eonverge  to 
/  if  —f^  epi-converge  to  — /.  The  relationship  between  epi-  and  hypo-convergence  and  other  modes 
are  convergence  in  the  present  context  is  examined  below;  see  also  [56,  Chapters  4  &  7]  and  [57]  for 
broader  treatments. 

Since  the  supremum  of  an  use  function  on  a  compact  set  is  attained,  the  consideration  of  use  densities 
naturally  arises  in  applications  where  the  subsequent  use  of  the  densities  involves  maximization,  such 
as  for  the  purpose  of  finding  their  modes.  Similarly,  the  Isc  densities  is  the  natural  class  to  consider 
in  the  context  of  subsequent  minimization.  We  next  state  an  approximation  results  for  exponential 
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epi-splines. 


2.4  Theorem  (Isc  and  use  dense  approximations  [57]).  For  any  p  G  JNq  and  an  infinite 

refinement  of  [Z,u],  under  the  hypo-topology, 

(1^  x-spF(m'^)  usc-fcns([Z,  nj)  is  dense  in  {e“®  |  s  G  lsc-fcns{[l,u])} 

u&N  / 

and  under  the  epi-topology, 

(1^  x-spF(m^)  isc-fcns([/,  u])  is  dense  in  {e“®  |  s  G  usc-fcns{[l,u\)}. 
u&N  / 

Consequently,  for  a  sufficiently  fine  mesh  and  regardless  of  the  order,  exponential  epi-splines  provide 
arbitrarily  accurate  approximations  of  exp(— /(•)),  /  G  lsc-fcns([Z,  u])  and  /  G  usc-fcns([/, «]).  In  the 
remainder  of  the  paper,  we  therefore  mainly  focus  on  (Ppjn)  for  fixed  p  and  m. 


2.3  Computations,  Existence,  and  Uniqueness 

We  now  turn  to  a  convenient  representation  of  epi-splines,  also  given  in  [57],  which  plays  an  essential  role 
in  computations  and  analysis.  This  leads  to  a  finite-dimensional  optimization  problem  for  computing 
the  estimator  /i”,  which  we  then  analyze. 

Every  s  G  e-spF(m),  with  m  =  {m,fc}^g,  is  uniquely  represented  by  an  epi- spline  parameter 
r  :=  {so,...,SN,ai,...,aN),  Sk  G  M,  k  =  0, N,  G  /c  =  1, ...,  M, 


such  that  for  any  x  G  [mo,mAr], 


s{x)  :=  {cp^rn{x),r), 

where  {z,z')  :=  YliZiz'i  and  Cp^m  ■  [nxo,mN]  — >•  is  defined  by 

^  _ f  (^7V-i-i-i-(p+i)(fc— 1) )  Xk )  Xf^, . .. ,  Xj^ ,  fc) ) )  if  X  G  (Tnjf_\ ,  k  1, .. .,  N 

\(0fc,  l,OAr_fc+(p+i)Ar),  if  X  =  rUfe,  /c  =  0,  ...,  A^, 


with  Xk  =  X  —  ruk-i  and  0^  denoting  the  Udimensional  zero  vector,  k  G  JN,  and  Oq  being  a  term  that 
is  omitted.  This  representation  of  an  epi-spline  s  lets  the  first  -|-  1  components  in  the  vector  r  be 
the  values  of  s  on  m.  The  remaining  [p  +  1)A^  components  are  divided  into  N  blocks  of  {p  +  l)-tuples, 
each  of  which  gives  the  coefficients  of  the  polynomial  defining  s  on  intervals  of  the  form  (mfc_i,mfc). 
Specifically,  ak  =  ■■■■,CLk,p)  is  such  that 

p 

s{x)  =  ^  afc,i(x  -  mk-if,  for  x  G  m^).  A:  =  1, 2, ...,  N. 

i=0 

Since  the  first  A^  -|-  1  components  of  r  determine  the  value  of  an  epi-spline  only  on  m,  which  consists  of 
a  finite  number  of  points,  we  refer  to  the  remaining  {p+  1)A^  components  of  r  as  the  essential  epi-spline 


parameter  and  write  r  =  (rmeshj  ^ess),  with  r^esh  S  and  ress  S  ,  to  indicate  this  partition 

of  r.  Correspondingly,  we  let  Cp,m  =  (cmesh,  Cess)- 

Since  the  value  of  a  density  at  a  finite  number  of  points  is  immaterial  for  the  characterization  of 
the  corresponding  probability  distribution,  it  may  at  first  appear  unnecessary  to  specify  the  value  of 
an  exponential  epi-spline  Instead  of  determining  r  =  (umesh)  ^ess))  one  could  simply 

focus  on  Tess  and  this  is  certainly  the  case  for  continuous  exponential  epi-splines.  However,  in  the 
discontinuous  case  the  situation  is  more  subtle.  Since  we  consider  functions  in  lsc-fcns([^,  u]),  which  are 
defined  on  the  whole  [l,u],  their  approximations  should  also  be  defined  on  the  whole  [l,u].  In  addition, 
we  would  like  to  handle  soft  information  such  as  bounds  on  the  values  of  a  density  estimate  at  particular 
points,  including  at  m.  Hence,  we  find  it  most  convenient  to  consider  the  value  of  epi-splines  at  the 
mesh  independently  and  proceed  with  the  more  general  framework  involving  rmesh- 

We  note  that  convergence  in  the  epi-spline  parameter  is  equivalent  to  uniform  convergence  of  the 
corresponding  exponential  epi-splines  and,  under  a  restriction  to  use  functions,  also  to  convergence  in 
the  hypo-distance.  Specihcally,  if  €  x-spl^(m),  with  m  =  =  e~^''  =  ^ 

and  hP  =  ^  then  the  following  hold  [57]: 

r^  hP  — >■  hP  uniformly  on  [mo,  mAr] 

—hP)  — )•  0  d{s'^ ,  s°)  — )•  0. 

Moreover,  if  /i^,  hP  are  use,  then  also 

hP  — >■  hP  uniformly  on  [mo,mAr]  d{—h'^,  —hP)  — >•  0. 


We  observe  that  since  the  hypo-distance  does  not  distinguish  between  a  function  and  its  use  regu¬ 
larization  (see  Proposition  7.4  in  [56]),  uniform  convergence  cannot  generally  be  implied  from  hypo- 
convergence,  even  for  exponential  epi-splines. 

We  next  deal  with  existence  and  uniqueness  of  the  estimator  /i"'  =  exp(— s"'(-))  and  consider  a 
computational  convenient  equivalent  form  of  using  the  representation  s  =  We 

consider  a  realization  of  with  replaced  by  observed  values  x^, . . .  ,x",  and  5"  is  now 

a  realization  of  the  random  constraint  set.  Since  the  meaning  is  clear  from  the  context,  we  denote 
realizations  also  by  We  let  C  be  the  set  of  epi-spline  parameters  corresponding 

to  the  set  of  epi-splines  S'",  i.e.. 


i?"  :=  {r  G  1  {cp,mi-),r)  G  5"}; 


for  example,  if  S"  =  e-spF(m),  then  i?"  =  ][l(P+P^+^ ,  When  incorporating  soft  information,  i?"  and 
S"  become  more  restrictive  as  we  see  in  §3.  We  let  both  the  random  set  and  its  realizations  be  denoted 
by  7?".  We  also  let 


R^}  :=  ■Ire  R 


rmjv 

/ 

-'mo 


As  stated  next,  {Pp^rn)  is  equivalent  to  the  finite-dimensional  problem: 


1 

min  — 
rGii"  n 


n 

^{Cp^m{xP,r). 

i=l 


Clearly,  a  realization  . . .  x"  and  S"  generates  a  realization  (Pp^)  as  well  as  a  corresponding  realiza¬ 
tion  {Pp,m)- 
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2.5  Theorem  (computing  estimates).  Given  p  and  m  =  for  every  corresponding  realizations 

{Pp,m)  and  one  has 

(i)  If  s”  G  e-spl^(m)  is  optimal  for  ^hen  there  exists  an  r”  G  1?(p+2)^+i  optimal  for  {Pp^m) 

with  s”  =  {cp^m{-),r^)- 

(a)  If  r"-  G  l?(P+2)^+i  is  optimal  for  {Pp^rn)j  then  s"  =  {cp^rn{-),r^)  is  optimal  for  (Pp^m)  and  the 
exponential  epi-spline  estimator 


h^{x) 


e  xG[mo,mAr] 

0,  otherwise. 


(in)  If  is  nonempty  and  is  compact,  then  {Pp^m)  ^as  an  optimal  solution. 

Proof:  The  equivalence  of  (P^^)  (Pp^m)  follows  directly  from  the  representation  s  =  {cp^rn{-) ,  i") ■ 
The  existence  of  an  optimal  solution  of  [Ppm)  follows  trivially  from  the  continuity  of  the  involved 
functions  and  the  compactness  of  RI^ .  □ 

While  the  objective  function  in  (P^^)  linear,  P"  may  be  nonconvex.  Hence,  (P^^)  could  pos¬ 
sess  local  minimizers  that  are  not  globally  optimal,  increasing  the  complexity  of  solving  the  problem 
numerically.  We  see  in  §3  that  P”  is  often  a  polyhedron  or  at  least  convex.  Hence,  the  main  difficulty 
in  (P^m)  is  associated  with  the  integral  constraint.  However,  under  broad  conditions  stated  next,  that 
constraint  can  be  relaxed  as  utilized  in  other  contexts  earlier  (see  for  example  [34]).  These  conditions 
essentially  imply  that  r  can  be  improved  whenever  the  corresponding  function  integrates  to  a  number 
less  than  one. 

2.6  Definition  A  realization  (Pp^)  aaid  to  be  loosely  constrained  if  for  every  r  G  P”  with 

<  1,  there  exists  r'  G  P”  with  Yl'i=iiop,m{P),P  —  r)  <  0. 

The  following  Proposition  2.8  and  §3  provide  examples  of  loosely  constrained  realizations.  We  give  an 
immediate  consequence  next. 

2.7  Proposition  Suppose  that  a  realization  {Pp^rn)  loosely  constrained.  Then,  that  realization  and 
the  corresponding  relaxed  problem 


1  “  rruN 

{rlxPp  ^,}  :  min  —  ^(cp^m(P)T)  such  that  J  <  1 

have  identical  sets  of  optimal  solutions.  Moreover,  if  R^  is  convex,  then  {rlxPp  „^)  is  a  convex  problem. 

In  Theorem  4.7  below  we  show  that  even  beyond  loosely  constrained  realizations,  the  consideration  of 
(rlxPpp^)  is  justified.  In  view  of  the  preceding  discussion  and  results,  it  is  clear  that  the  exponential  epi- 
spline  estimator  is  computationally  tractable  by  means  of  well-developed  convex  optimization  algorithms 
in  many  practical  situations  and  by  means  of  nonlinear  programming  algorithms  in  even  more  situations. 
In  some  cases,  for  example  when  RT  is  polyhedral,  some  further  computational  benefits  may  arise  from 
utilizing  the  following  reformulation,  which  is  valid  under  additional  assumptions;  see  §3  for  examples. 
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The  next  result  also  gives  a  sufficient  condition  for  a  realization  {Pp^rn)  to  be  loosely  constrained.  We 
use  the  notation  to  indicate  the  {{p  +  2)N  +  l)-dimensional  vector  consisting  of  zeros,  except  at 
entries  1  through  +  1  as  well  as  entries  +  2  +  (fc  —  l)(p  +  1),  /c  =  1,  2, N ,  where  it  is  unity. 


2.8  Proposition  A  realization  {Pp^rn)  tor  which  every  r  S  i?"  and  j3  €  IR  satisfy  r  +  /dlp^N  S  is 
loosely  constrained  and  its  set  of  optimal  solutions  is  identical  to  that  of  the  corresponding  penalized 
problem 

1  ”  grriN 

{pnlP^J  :  min  -  y^{cp,m{P) ,  r)  +  / 

rsR-  n  ^ 

Moreover,  if  R^  is  convex,  then  {pnlPp  „i)  is  a  convex  problem. 

Proof:  We  consider  corresponding  realizations  (P^m)  and  let  r  G  R^  satisfy 

fmo^  =  7  <  1.  For  r'  =  r  +  (log  7)  Ip, w, 


e-i^P,mipy)dx  =  - 
7 


g-(cp,^(x),r)^^  =  1. 


(1) 


Moreover,  YA=ii'^p,m{P),r'  -  r)  =  I]r=i(cp,m(P),  (log7)lp,Ar)  =  ralog7  <  0.  Since  r'  G  RJ^  by  assump¬ 
tion,  [Ppm)  is  loosely  constrained  by  Definition  2.6. 

We  next  consider  the  penalized  problem.  For  any  r  E  let 


1  '''  pmN 

”  j=l  -^^0 


and  let  f  G  P”  be  arbitrary.  Since  every  epi-spline  is  piecewise  polynomial  and  therefore  integrates  on 
[mo,  mw]  to  a  finite  number,  there  exists  a  7  G  (0,  00)  such  that  gy  assumption, 

f  +  (log  7)  Ip,  AT  G  and,  following  the  same  argument  as  in  (1), 


g-{cp,m(a;),r-|-(log7)lp,jv>^^  _ 


Consequently,  f  +  (log 7) Ip, w  is  feasible  in  (Pp^m)-  Suppose  that  is  optimal  for  {Pp^ml-  It  follows  that 
r”  also  minimizes  /”  on  RJj  because  this  problem  deviates  from  {Pp^rn)  only  by  the  constant  one  in  the 
objective  function.  Using  an  argument  similar  to  that  of  Lemma  2.3  in  [34],  we  find  that 

r  (r)  -  nn 

=  -  y^(cp,m(P)u  +  (log7)lp,Af)  -  iog7  +  /  - /”(r") 

=  f"'ir  +  (log  7)lp,Af)  -  log  7  -  1  +  7  -  rir^) 

>  -  log  7  -  1  -t-  7, 


where  the  inequality  follows  from  the  fact  that  r”  is  optimal  and  f  -|-  (log  7)  Ip, w  is  feasible  in  (P^^)- 
Since  —  log7  —  1  -|-  7  >  0  for  7  G  (0,  00),  7  ^  1,  we  find  that  every  r  G  R^  with  i 
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has  /”(r)  >  /"(r”)  and  consequently  cannot  minimize  /”  on  R^.  The  first  conclusion  then  follows. 
Convexity  of  {pnlPp  „^)  follows  directly  from  the  convexity  of  the  integral  term.  □ 

In  general,  one  cannot  expect  a  unique  optimal  solution  of  a  realization  {Pp^m)i  consequently  a 
unique  exponential  epi-spline  estimate,  due  to  the  flexibility  in  the  choice  of  values  of  the  epi-spline  on 
a  mesh  that  is  not  a  subset  of  the  sample  realization  ...,  x"'.  In  fact,  if  the  first  +  1  components 

of  the  epi-spline  parameter  r  are  not  constrained  by  then  there  is  an  infinite  number  of  optimal 

solutions  whenever  one  exists.  The  next  result  shows  that  when  these  values  are  uniquely  determined 
by  the  essential  epi-spline  parameter,  uniqueness  may  still  be  achieved.  Such  a  dependence  on  the 
essential  epi-spline  parameter  is  manifest,  for  example,  in  the  case  of  continuous  epi-splines  used  when 
dealing  with  densities  known  to  be  continuous. 


2.9  Proposition  Suppose  that  corresponding  realizations  {Pp^rn)  have  convex, 

{x^, ...,  x"'}  n  m  =  0,  and  satisfy  the  condition: 

(r mesh,  r ess),  ir'mesh,  Kss)  ^  ^ith  Vess  =  r'^ss,  RnpHeS  Tmesh  =  Cesh- 

Then,  the  following  hold: 

(i)  If  an  optimal  solution  r  of  the  realization  (rlxPp^)  is  in  RJ^,  then  there  are  no  other  optimal 
solutions. 


(a)  The  realization  {pnlPp  „^)  has  at  most  one  optimal  solution. 


Proof:  We  start  by  showing  strictly  convexity  of  the  integral  term  as  a  function  of  the  essential  epi- 
spline  parameters.  Given  m  =  {mk}^=Q,  we  define  '0  :  — >  IR  and  p  :  [mo,mAr]  x  ]R^p+^1^  — >  IR 

by 


'^(re 


fm-N 

)  :=  /  ip{x,ress)dx,  with  ip{x,ress)  ■= 
J  mo 


For  all  X  G  [mQ,m]v]  and  ress,p'ess  ^  ,  twice  differentiation  with  respect  to  the  second  argument 

in  ip  gives  that 


vV(x,ress)rLs)  =  (cess(x),  >  0. 


Suppose  that  ^  0.  Then,  there  exists  a  k  €  {1,2,  ...,A^}  such  that  (cess(a^))  ^  polynomial 

in  X  for  x  G  with  not  all  coefficients  zero.  Hence,  there  exists  a  subset  of  with 

positive  Lebesgue  measure  on  which  {cess{x) ,  ^  0  and 


f-mjv 


(Ce 


>  0. 


/mo 


(2) 


Since  the  dominated  convergence  theorem  implies  that  the  left-hand  side  of  (2)  equals  V^V'(’'ess)^Ls)5 
we  find  that  ijj  is  strictly  convex  by  the  second-order  condition  for  convexity. 

We  let  ^  =  (1/n)  X]”=i(cess(a^*)) ')  +  which  is  therefore  also  strictly  convex. 

We  first  consider  (ii).  Suppose  for  the  sake  of  a  contradiction  that  there  exist  r  =  {rmesh,'>'ess)  7^ 
r'  =  (^mesh’^ess)  that  both  are  optimal  for  the  realization  {pnlPp„,),  with  optimal  value  v*.  Since 
{x^,...,x"'}  n  m  =  0,  the  objective  function  in  this  problem  depends  only  on  the  essential  epi-spline 
parameter  and,  in  fact,  V’(^ess)  =  V’(^Ls)  =  We  consider  two  cases. 
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a)  Suppose  that  ress  =  ?'Ls)  but  then  r^esh  =  ^mesh  by  assumption  and  we  contradict  the  hypothesis 
that  r  ^  r' . 

b)  Suppose  that  ress  7^  Since  V’  is  strictly  convex,  there  exists  a  unique  minimizer  r'^s  of  V’  over 
the  convex  hull  of  ress  and  r'^g.  Moreover,  there  exists  an  a  G  (0, 1)  such  that  r'gg  =  aress  +  (1  — 

and  ip{r"gg)  <  v* .  By  the  convexity  of  R"",  r"  =  (armesh  +  (1  “  ®)’'mesh’ ^ess)  ^  and  its  objective 
function  value  in  (pnlPp^^)  is  ipir'^ss)  <  which  contradicts  the  optimality  of  v*. 

Second,  we  focus  on  (i).  Suppose  that  r  =  (rmesh,  ^ess)  G  Rj  is  optimal  for  the  realization  {rlxPp  „^). 
We  consider  two  cases. 

a)  Suppose  that  dx  >  1  for  all  r'  G  R^.  Then  by  strict  convexity  of  'll),  there  exists 

a  unique  minimizer  r"gg  of  -0  on  {r'gg  G  |  (^meshTess)  G  R"^  for  some  G  However, 

r'gg  =  ress  because  '^(ress)  =  1-  Another  optimal  solution  for  the  realization  (rlxPp  ,^)  would  thus  have 
essential  epi-spline  parameter  identical  to  ress-  However,  by  assumption,  such  a  solution  would  then 
also  be  identical  to  r  in  the  remaining  components,  which  implies  it  coincides  with  r. 

b)  Suppose  that  there  exists  r'  G  R^  such  that  e-{<^p,m{x),r  )(ix  <c  1.  Then,  the  Slater  constraint 
qualification  is  satisfied  and  there  exists  a  multiplier  A  >  0  such  that  the  realization  {rlxPp  „^  has  the 
same  set  of  optimal  solutions  as  the  problem 


1 

mm  — 
rGi?"  n 


n 

2=1 


(3) 


Repeating  the  arguments  that  lead  to  (ii),  with  (3)  in  place  of  the  realization  (pnlPp^^),  shows  that 
there  are  no  other  optimal  solutions  of  the  realization  {rlxPp  „^)  than  r.  □ 


3  Soft  Information 

We  implement  soft  information  about  the  density  under  consideration  in  the  estimation  problem  (Ppm) 
through  the  set  R"^,  which  can  be  any,  possibly  random,  subset  of  .  It  is  observed  empirically 

and  also  illustrated  in  §5  that  soft  information  tends  to  improve  density  estimates.  In  this  section, 
we  give  a  soft  consistency  theorem  that,  in  part,  explains  these  observations.  We  also  give  examples 
of  constraints  for  specific  instances  of  soft  information.  We  start,  however,  with  a  convenient  result 
regarding  the  Kullback-Leibler  divergence. 

Let  dKL{h\\g)  denote  the  Kullback-Leibler  divergence  from  a  density  /i  to  a  density  g  defined  on 
M,  i.e.,  dKL{h\\g)  :=  /i(x)  log  Here  and  below  we  make  the  standard  interpretation  that 

/3i  log  (/3i  7/32)  =  0  when  /3i  =  0  regardless  of  the  value  of  /32  G  1?  and  (di  log/3i//32  =  oo  when  /3i  >  0 
and  /32  =  0.  An  immediate  consequence  of  the  dehnition  of  the  divergence  is  the  following  result,  which 
facilitates  formulation  of  certain  soft  information  as  well  as  theoretical  results  below. 

3.1  Proposition  Suppose  h  and  are  densities  with  s  =  {cp^rn{-) ,  r)  G  e-spK(m),  m  = 

Then, 

j  r-rriN  \  r-oa 

dKL{h\\e~'')  =  (  I  Cp^rn{x)h{x)dx,r)+  {logh{x))h{x)dx. 

\  J  mo  /  J  —oo 
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If  in  addition  h  =  e  with  s'  =  {cp^rn{-)if')  G  e-spF(m),  then 


dKL{h\\e 


f-mjv 


Cp,m{x)h{x)dx,  r  —  r' 


'mo 


The  next  theorem  is  a  direct  consequence  of  Proposition  3.1. 


3.2  Theorem  (soft  consistency).  If  the  true  density  hP  =  with  and  there  exists  a 

p  >  0  such  that  ||r  —  r'\\  <  p  for  all  r,  r'  G  BP ,  then  the  estimate  /i”’''  obtained  from  solving  a  realization 
{Pp,m)  satisfies 


dKiih^Wh^^n  < 


Cp^m{x)hP  {x)dx 


P- 


Moreover,  for  any  fixed  n,  if  p  — )■  0,  then  — )•  hP  uniformly  on  [mo,  ttin]- 


An  effective  strategy  for  improving  exponential  epi-spline  estimates  is  therefore  to  reduce  the  size  of 
R^,  of  course,  without  eliminating  the  true  epi-spline  parameter. 

We  next  illustrate  the  wide  range  of  soft  information  that  is  easily  included  within  the  exponential 
epi-spline  framework^. 


Support  bounds  and  mesh.  The  choice  of  mesh  m  =  accounts  for  support  bounds  and  mo 

and  mjy  should,  ideally,  correspond  to  the  lower  and  upper  bounds  of  the  support  of  the  true  density, 
respectively.  If  these  are  unknown,  conservative  values  could  be  used  as  our  ability  to  approximate 
extended  real- valued  functions  does  not  rule  out  the  possibility  of  vanishing  densities  on  [mo,  mAr],  even 
for  the  exponentially  transformed  kind.  In  practice,  mo  and  m^v  can  be  selected  such  that  the  observed 
sample  is  well  within  [mo, mAr].  The  mesh  is  often  selected  to  be  uniform,  but  the  methodology  offers 
much  flexibility  and  soft  information  about  possible  locations  of  discontinuities,  for  example,  could  lead 
to  other  choices.  Consequently,  the  mesh  is  selected  essentially  independently  of  the  sample  and  one 
should  simply  focus  on  having  a  mesh  that  is  sufficiently  fine  to  allow  epi-splines  to  approximate  the 
underlying  functions  with  a  sufficient  accuracy.  Of  course,  some  restrain  on  mesh  refinement  might  be 
imposed  by  computational  considerations.  The  number  of  decision  variables  in  the  resulting  optimiza¬ 
tion  problem  grows  linearly  in  N. 


Semi-continuity,  continuity  and  smoothness.  It  is  straightforward  to  ensure  use,  Isc,  continuity, 
and  various  degrees  of  differentiability  through  linear  constraints;  see  [58]  for  details.  The  inclusion 
of  such  constraint  will  keep  a  problem  loosely  constrained  as  the  sufficient  condition  for  being  loosely 
constrained  in  Proposition  2.8  is  satisfied. 


We  recall  that  the  epi-spline  parameter  is  of  the  forms 

r  =  {sq,  Si,  ...,  sn,  ai,o,  «i,i;  ai.p,  «2,0j  «2,i,  •••,  «2,p,  ■•••,  “Af.o,  •■•5 

where  the  first  -|-  1  components  specify  the  value  of  the  epi-spline  at  the  mesh  points  rriQ,  mi,  ..., 
mi\f  and  the  remaining  N  blocks  of  p  -|-  1  components  give  the  polynomial  of  order  p  in  each  interval 

^Naturally,  with  the  possibility  of  including  incorrect  soft  information,  there  is  a  need  for  validation.  Although  impor¬ 
tant,  we  limit  the  discussion  of  this  topic  to  Theorems  4.2,  4.4,  and  4.7  as  well  as  §5.4;  see  for  example  [63]  and  [9]  for 
tests  in  related  contexts. 
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(mfc_i,mfc),  k  =  1,2, 


Slope  information.  The  quantity  h' (x)'^ / h{x)dx  is  a  “measure  of  smoothness”  that  is  easily 
expressed  in  terms  of  the  epi-spline  parameter,  but  upper  and  lower  bounds  on  this  expression  result 
in  undesirable  nonconvex  constraints.  However,  an  alternative  “normalization,”  which  also  squares  the 
denominator,  results  in  a  convex  constraint.  Specifically,  if  /i  =  then 

^  rrrH;  /  P  •  \  ^ 

{h'{x)/h{x))‘^dx  =  '^  I  ['^iak,i{x-mk-iy~^\  dx. 

k=iJ^k-i  Vi=i  / 

An  upper  bound  on  this  quantity  results  is  a  convex  constraint.  In  some  application,  one  may  also  seek 
bounds  at  X  €  {mk-i,mk)  by  restricting 


p 

h'{x)/h{x)  =  -{c'p^rn{x),r)  = 

i=l 


and / or 


h''{x)/h{x)  =  -^i(i-l)afc,i(x-mfc_i)*  ^ 


i=2 


V  j  =  l 


Upper  and  lower  bounds  on  the  first  quantity  result  in  linear  constraints  and  upper  bounds  on  the 
second  quantity  gives  a  quadratic  convex  constraint.  The  constraints  could  be  imposed  at  any  num¬ 
ber  of  values  of  x,  but  we  note  that  if  p  =  2  and  the  density  is  log-concave,  as  describe  below,  and 
continuously  differentiable,  then  lower  bounds  on  h'{x)/h{x)  at  mi,  m2,  ■■■,  m^  suffices  to  ensure  that 
the  constraints  are  satisfied  for  all  x  €  [mo,mAr].  Similarly,  an  upper  bound  on  h'{x)/h{x)  needs  only 
be  imposed  at  mo,  mi,  ...,  mAr_i.  The  inclusion  of  the  pointwise  constraints  keep  a  problem  loosely 
constrained  as  the  sufficient  condition  for  being  loosely  constrained  in  Proposition  2.8  is  satisfied.  We 
observe  that  constraints  on  h'{x)/h{x)  is  an  effective  way  of  controlling  the  “tails”  near  mg  and  m^- 


Monotonicity.  We  achieve  a  nondecreasing  (nonincreasing)  density  by  imposing  nonnegativity  (non¬ 
positivity)  on  h'{x)/h{x)  for  all  x  G  {mk-i,mk),k  =  1,2,  ...,N  as  well  as 

p 

Sk-i  >  {<)ak,o,  Sk  <  i>)'^ak,iimk  -  mk-iY,  k  =  l,2,...,N. 

i=0 

Again,  simplifications  arise,  for  example,  if  p  =  2  and  the  density  is  log-concave.  Then,  it  suffices  to 
impose  that  -|-  2ak,2{pnk  —  Pnk-i)  <  0  (a^q  >  0),  A:  =  1,2,  ...,N.  Again,  a  problem  remains  loosely 
constrained  after  the  inclusion  of  these  constraints. 


Log-concavity.  We  recall  that  h  =  is  log-concave  if  and  only  if  {cp^rnY)-:'’^)  is  convex.  This 

condition  is  ensured  if  {cp^mY)^'’^)  is  (i)  continuous,  (ii)  for  A:  =  1,2,  ...,N  —  1,  its  left  derivatives  at  m^ 
is  no  larger  than  its  right  derivative,  i.e., 

p 

iak,i{mk  -  mfc_i)*“^  <  a^+iq,  A:  =  1,  2, ...,  -  1, 

i=l 
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and  (iii)  on  each  k  =  1,2,  {cp^rn{-),'>')  is  convex,  i.e., 

p 

i{i  -  l)ak,i{x  -  >0,  A:  =  1,  2,  x  €  (mfc-i,  mfc). 

i=2 

Here,  the  obvious  interpretations  are  required  when  p  =  0, 1.  The  latter  condition  simplifies  to  ak^2  ^  0, 
k  =  1,2,...,  A^,  when  p  =  2.  Hence,  in  that  case,  the  condition  of  log-concavity  requires  only  a  finite 
number  of  linear  constraints.  Again,  the  problem  remains  loosely  constrained. 

Unimodality  and  locations  of  modes.  We  implement  soft  information  about  unimodality  of  a 
continuous  density  by  designating  one  mesh  point  as  the  mode,  and  then  constraining  the  density 
to  be  increasing  and  decreasing  on  (mfc/_i,mfc/)  and  (mfc/, respectively,  and  nondecreasing  on 
a-nd  nonincreasing  on  Solving  the  resulting  estimation  problem  gives  a  candidate 

density.  The  process  is  repeated  for  alternative  mode  locations  rrik,  k  =  0, 1,...,A^,  k  ^  k' ,  and  the 
density  with  the  largest  likelihood  is  retained  as  the  estimate.  The  same  result  is  obtained  by  solving 
a  single  augmented  problem  involving  iV  +  1  binary  variables.  AT-modality  is  achieved  similarly  by 
partitioning  [mo,mN]  into  K  intervals,  with  each  having  a  unimodal  constraint.  The  process  must  be 
repeated  for  each  partition  of  interest.  To  specify  that  certain  ruk  are  modes  is  achieved  by  ensuring 
that  the  density  is  increasing  and  decreasing  on  (mfc_i,mfc)  and  {mk,mk+i),  respectively. 

Symmetry.  We  ensure  symmetry  by  designating  a  point  of  symmetry  and  then  solving  only  for 
the  upper  half  of  the  density  on  [m^ ,  m^v] ,  with  trivial  changes  to  the  likelihood  function  and  integral 
constraint.  The  process  is  repeated  for  each  possible  symmetry  point.  Again,  auxiliary  binary  variables 
would  obtain  the  same  effect  within  one  augumented  formulation. 

Bounds  on  density  values.  It  is  straightforward  to  impose  pointwise  upper  and  lower  bounds 
and  on  the  value  of  h{x)  =  with  0  <  <  h^^^{x)  <  oo.  It  suffices  to  set 

p 

-  log  ^  ak,iix  -  nik-iY  >  -  log  for  x  G  {mk-i,mk) 

i=0 

and 

—  log  >  Sfc  >  —  log  for  X  =  mk,  /c  =  0, 1, ...,  N. 

While  these  constraints  are  linear,  they  do  not  satisfy  the  assumption  of  Proposition  2.8.  However,  if 
only  the  lower  bound  h{x)  >  is  imposed,  the  resulting  problem  remains  loosely  constrained. 

Kullback-Leibler  divergence  and  the  Bayesian  paradigm.  Proposition  3.1  provides  a  convenient 
form  of  implementing  soft  information  about  a  reference  density  In  a  Bayesian- like  paradigm, 

suppose  that  we  seek  a  density  that  is  “near”  which  for  example  could  correspond  to  the  posterior 
mean  obtained  through  Bayes  theorem.  Then,  a  constraint 

<  ip{n),  (4) 

indeed  ensures  that  the  estimate  h”  is  within  ip{n)  of  h'’®^  as  measured  by  the  Kullback-Leibler  diver¬ 
gence.  If  h'"®^  resulted  from  Bayes  theorem,  then  these  constraints  allow  for  some  flexibility  to  explore 
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densities  near  the  one  prescribed  by  a  classical  Bayesian  approach.  In  view  of  Proposition  3.1,  this 
constraint  is  linear  in  r  and  thus  easily  implement  able.  Here,  ip  :  JNq  — >■  [0,oo)  is  the  cognitive  content 
of  the  reference  density  and  should  satisfy  (^(0)  =  0,  lim„_j.oo  P’in)  =  oo,  and  be  increasing  since 
an  increasing  sample  size  should  place  gradually  less  emphasis  on  Of  course,  if  ip{n)  =  0,  then 

(P^  p)  simply  returns  /i^®^,  or  a  density  that  deviates  at  most  on  m.  If  ip{n)  =  oo,  then  no  information 
about  the  reference  density  is  included.  While  technically  not  correct  in  the  sense  of  classical  Bayesian 
statistics,  one  can  also  view  h’’®^  as  a  “prior”  density  and  the  resulting  density  /i”  obtained  from 
as  the  “posterior”  density.  Of  course,  a  constraint  P  for  some  k  >  0  is  also 

easily  imp  lamentable,  and  could  be  relevant  in  contexts  where  a  “diversity”  of  densities  is  sought.  For 
example,  one  may  be  concerned  with  the  validity  of  the  soft  information  imposed  in  an  initial  estimate 
of  a  density  and  seek  a  set  of  alternative  densities  that  are  some  distance  away  from  the  original  esti¬ 
mate;  see  §5.2  for  an  example. 

Bounds  on  moments.  Soft  information  may  result  in  constraints  on  the  j-th  moment  of  the  form 
^mm  ^  where  G  M,  are  given  constants.  The 

right-most  inequality  results  in  a  convex  constraint  in  r,  while  the  left-most  in  a  nonconvex  constraint. 

Bounds  on  cumulative  distribution  functions.  Suppose  that  the  cumulative  distribution  function 
of  h  =  at  7  G  [mo,mN]  must  lie  between  the  lower  bound  and  the  upper  bound 

This  results  in  the  two  convex  constraints  and  <  1— 

4  Consistency,  Asymptotics,  and  Error  Bounds 

Being  concerned,  from  now  on,  with  asymptotics,  we  again  view  (Pp^m)  to  be  a  random  optimization 
problem,  i.e., 

1  fm-N 

(P^rn)  ■  ~  such  that  /  dx  =  1; 

whose  random  elements  are  the  variables  X^, and  the  random  set  S^;  we  still  designate  a  solution 
by  s"'  which  is  now,  itself,  a  random  epi-spline.  To  achieve  consistency,  derive  asymptotics  and  other 
results,  we  view  {(Pj^m)}^i!  given  m  and  p,  as  a  sequence  of  random  optimization  problems  that 
under  quite  general  assumptions  converges  in  some  sense  to  a  limiting  optimization  problem,  whose 
optimal  solution  recovers  a  true  density  G  x-spF(m),  as  the  sample  size  n  — >■  oo.  We  note  that 
the  restriction  to  x-spF(m)  for  given  m  and  p  is  justified  by  Theorem  2.4,  but  we  also  discuss  the 
consideration  of  densities  beyond  this  broad  class;  see  Theorem  4.4  below. 

We  define  the  “approximation”  of  a  density  h  by  an  exponential  epi-spline  as  follows. 

4.1  Definition  (Kullback-Leibler  projection).  For  any  density  h  on  M  and  family  e-spF(m),  m  = 
the  Kullback-Leibler  projection  of  h  on  e-spF(m)  is  the  set 

Sp^m{h)  '.=  argmin  dKL{h\\e  such  that  /  e  ^^^'dx  =  l.  (5) 

5Ce-spl^(m)  J  mQ 

If  the  minimization  is  further  constrained  by  s  tz  S  G  e-spF(m),  then  we  denote  the  set  of  optimal 
solutions  by  Sp  „^{h)  and  refer  to  it  as  the  Kullback-Leibler  projection  relative  to  S. 
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We  see  that  Sp^rn{h)  is  the  set  of  epi-splines  that  gives  the  “closest”  exponential  epi-spline  densities  to 
h  in  the  sense  of  the  Kullback-Leibler  divergence.  It  is  well  known  that  dKL{h\\g)  >  0  for  all  densities 
h  and  g,  and  that  dKL{h\\g)  =  0  if  and  only  if  h  =  g,  except  possibly  on  a  set  of  Lebesgue  measure 
zero.  Hence,  if  a  density  h  =  €  x-spF(m),  m  =  {mk}^^Q,  then  s  €  Sp^m{h)  and  all  s  G  Sp^rn{h)  are 

identical  to  s  (Lebesgue)  almost  everywhere  on  [mo,  m^v]-  Since  s  and  s  are  polynomials  of  order  p  on 
each  open  interval  {mk-i,mk),  k  =  1,2,  ■.■,N,  they  must  be  identical  possibly  except  on  m. 

Suppose  that  =  e~^  G  x-spF(m),m  =  is  the  density  of  a  random  variable  X^,  which 

we  aim  to  estimate.  Then,  for  any  s  G  e-spF(m),  dKL{h‘^\\e~^)  =  E{logh^{X^)}  +  i?{s(X°)}.  Hence, 
there  is  a  constant  term  (with  respect  to  s)  in  the  objective  function  of  (5)  that  can  be  dropped  and 
we  reach  the  fact  that  every  optimal  solution  of 

f-TTlN 

{Ppm)  •  such  that  /  e~^^Pdx  =  1  (6) 

’  see-splP(m) 

is  identical  to  except  possibly  on  m.  Consequently,  if  the  family  x-spF(m)  under  consideration 
contains  the  true  density  h^,  then  {Ppm)  recovers  hP  or  a  member  in  its  “equivalence  class.” 

In  contrast  to  we  refer  to  (H^m)  as  the  true  problem.  Intuitively,  if  G  and  n  is 

large,  the  problem  {Pp^m)  approximates  the  true  problem  in  some  sense  and  one  would  hope  that  the 
corresponding  optimal  solutions  are  close.  We  next  formalize  this  observation,  which  implies  strong 
consistency  of  the  estimator  obtained  from  solving  {Pp^m)- 

4.2  Theorem  (consistency).  Suppose  that  the  true  density  hP  =  with  =  {cp^rn{')',T^)  S 

e-spF(m)  and  m  =  {m^j^Q,  {Pp^m)  E  derived  by  independent  sampling  from  hP ,  and  1  is  a 

sequence  of  optimal  solutions  of  {Pp^m)>  with  epi-spline  parameters 

If  limi?*^  exists  a.s.^  and  is  deterministic,  then  every  accumulation  point  r°°  of  satisfies 

{cp,m{-),r^)€S^Z{h'')  a.s., 

where  S°°  =  {s  G  e-spF(m)  |  s  =  {cp^rn{-) ,  r) X  G  limi?”}. 

Moreover,  regardless  of  whether  has  a  limit,  if  there  exists  a  sequence  with  G  BP 

for  all  n,  such  that  f”  — >■  r^  a.s.,  then  the  following  hold  a.s. 

(i)  The  accumulation  point  also  satisfies  {cp^m{-)x°°)  G  Sp^rn{h^)- 

(a )  The  essential  epi-spline  parameter  suhvector  of  r°°  is  identical  to  the  essential  epi-spline  parameter 
subvector  of  r^. 

(in)  If  r^  along  a  subsequence  K,  then  {cp^rn{')x'^)  and  uniformly 

on  [mo,m7v],  possibly  except  on  m. 

Proof:  Since  X^  G  [mo,mAr]  a.s.,  Cp^m{X^)  is  a  random  vector  with  finite  moments.  By  the  law  of 
large  number  {1/n)  X)iLi  Cp,m{X^)  P{cp^m{X^)}  a.s.  Let  G  be  arbitrary.  Then,  for 

''T) 

any  sequence  r  —>  r  , 

{^^^Cp,m{XP,r^  ^  {E{Cp,m{X^)},f^)  a.s.. 

^Limits  of  sets  are  here  taken  in  the  sense  of  Painleve-Kuratowski  [56,  *[7.31  and  the  probability  space  is  that  induced 
by  {(TpV)}“  1. 
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For  any  R  C  jR(p+2)Af+i^  we  define  Sii{r)  :=  0  if  r  G  i?  and  6ji{r)  :=  oo  otherwise.  Moreover,  let 
Rf  :=  {r  G  liuiR^  \  =  1}.  If  r°  G  Rf,  then 

liminf  >  (F^{cp,m(^°)},  1"°)  +  (r°)  a.s. 

Since  Rf^  =  limiZ",  it  is  closed.  Consequently,  if  0  i?|°,  then  the  previous  inequality  holds  with 
inhnity  on  both  sides.  Next,  suppose  that  G  Jj(P+2)Af+i  jg  arbitrary.  If  ^  Rf^,  then 


lim  sup 


1 

n 


i=l 


+  [r"")  <  (^{Cp,m(^°)},  +  5ry  (f°)  =  oo  a.s. 


If  G  RY  ^  then,  since  R^  =  limiZ",  there  exists  a  sequence  f”  — >■  with  f"'  G  i2p  for  all  n. 

Consequently, 


n 


^Cp,^(X*),fM  +5«.(r-)  ^  {E{cp,m{X^)},f^)  +  5R^{r^)  a.s. 


2=1 


Epi-convergence  of  {{l/n)YJi=iCp,m{X''),-)  +  Sr^  to  {E{cp^rn{X^)} ,  ■)  +  ^Rf  a.s.  then  follows  by 
Proposition  7.2  in  [56]  and  the  hrst  conclusions  by  Theorem  7.31  of  [56]  and  the  fact  that  f  G 
argminr(E{cp,m(^° )},?")  +  ^r^  if  and  only  if  {cp^rn{],r)  G  <S^^(/i°). 

We  next  turn  to  the  second  part  of  the  theorem.  Since  the  additional  assumption  implies  that  R^ 
becomes  arbitrary  close  to  a.s.,  item  (i)  follows  by  a  similar  argument  as  above.  Items  (ii)  and  (iii) 
are  conclusions  from  the  discussion  following  Dehnition  4.1.  □ 

The  first  part  of  Theorem  4.2  shows  that  regardless  of  the  soft  information,  which  may  even  exclude 
the  true  density,  the  resulting  exponential  epi-splines  tend  to  one  that  is  as  “close”  as  possible  to 
the  true  density  under  the  given  constraints  as  the  sample  size  increases.  Specihcally,  the  epi-splines 
computed  from  {(P^m)}^i  tend  to  a  point  in  the  Kullback-Leibler  projection,  relative  to  the  soft 
information  constraint  set,  of  the  true  density  on  the  class  of  epi-splines  under  consideration.  We  refer 
to  [24,  14,  44]  for  related  results  on  model  misspecfication.  The  second  part  shows  that  if  the  true 
density  is  not  excluded  by  the  soft  information,  then  {{Pp^m)}'?^Li  eventually  yields  the  true  density,  or 
possibly  a  closely  related  one  that  deviates  at  most  on  m. 

The  preceding  results  deal  with  the  case  when  the  true  density  can  be  exactly  represented  by  an 
exponential  epi-spline.  If  the  true  density  is  outside  the  class  under  consideration,  one  cannot  expect  to 
tend  to  the  true  density  even  if  the  sample  size  goes  to  inhnity.  However,  as  we  see  next,  if  two  densities 
are  close  in  the  hypo-distance,  then  their  Kullback-Leibler  projections  on  e-spF(m)  must  also  be  close 
in  some  sense.  We  will  see  that  this  has  a  direct  consequence  on  the  quality  of  density  estimates  when 
the  true  density  is  outside  the  class  of  exponential  epi-splines.  Before  the  main  theorem,  we  give  an 
intermediate  result. 


4.3  Proposition  Suppose  that  :  IR  ^  [0,oo],  :  JR  ^  [0,oo]  are  Lebesgue  integrable  on  every 

compact  subset  of  R  and  d(— /”,— /°)  — >•  0.  Then,  for  every  compact  set  X  C  R,  f^(x)dx  — >■ 
fx  f{x)dx. 
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Proof:  The  restrictions  of  /"■  and  /°  to  X,  denoted  by  and  /^,  satisfy  d{—f^,  —fx)  0-  Conse¬ 
quently,  :=  {(xjXo)  €  X  X  [0,  oo)  |  fx{x)  >  xo}  A^x  ■=  £  X  x  [0,oo)  [  fx{x)  >  xq}  in 

the  Painleve-Kuratowski  sense.  Since  the  Lebesgue  measures  of  and  A^  are  identical  to  f^{x)dx 
and  fx  f^{x)dx,  respectively,  the  conclusion  follows.  □ 

4.4  Theorem  (stability  of  Kullback-Leibler  projection).  Suppose  that  densities  hA,hP  on  [l,u]  satisfy 

d{—hA,  —hP)  — )■  0.  If  is  such  that  {cp^mf) £  Sp^rn{f>A)  for  m  =  {w,fc}^Q  with  mo  =  I,  mx  =  u, 
then  every  accumulation  point  of  is  the  epi-spline  parameter  of  some  £  Sp^rn{h^)- 

Proof:  Following  a  similar  argument  as  in  Proposition  2.8,  we  see  that  the  equality  constraints  in 
the  problems  defining  Sp^rn{fiA)  and  Sp^rnOP)  can  be  replaced  by  inequality.  Consequently,  every 
s”  £  Sp^rn{fP)  is  of  the  form  s"’  =  {cp^rn{-),r^),  with  r”  £  argmin,.  V'” (r)  +  6j{r),  where  'ijj^{r)  :  = 
{fffff  Cp^rn{x)hA{x)dx,r)  and  6i{r)  :=  0  if  ffff  dx  <  1  and  di{r)  :=  oo  otherwise.  Similarly, 

every  s®  £  Sp^rn{h^)  is  of  the  form  =  {cp^rni'),P),  where  r®  is  a  minimizer  of  defined  similar  to 
V’"',  but  with  h”  replaced  by  h^.  Clearly,  6i  and  ip^  +  6j  are  convex. 

By  Proposition  4.3,  f^hA{x)dx  — )>  f^hP{x)dx  for  any  compact  set  X  C  [mo,mAr]-  But  since  Cp^ra 
is  piecewise  polynomial  and  [mo,  m,Ar]  is  a  bounded  interval,  we  also  have  that  for  any  /c  =  1,  2, 

PTTlk  r^k 

/  Cp^rn{x)hP  {x)dx  /  Cp^rn{x)h^  {x)dx. 

Jmk-i  Jmk_i 

Hence,  it  follows  by  Proposition  7.2  and  Theorem  7.53  in  [56]  that  ip'^  +  6i  totally  epi-converges  to 
pp  +  dj.  The  result  then  is  a  consequence  of  Corollary  7.55  in  [56].  □ 

If  we  take  the  densities  hA  in  Theorem  4.4  to  be  exponential  epi-splines,  possibly  defined  on  increas¬ 
ingly  fine  meshes.  Theorem  2.4  shows  that  these  densities  indeed  can  be  made  to  approximate  with 
arbitrary  accuracy  any  Isc  or  use  density  hP  with  appropriate  choice  of  the  mesh.  Consequently,  the 
assumption  of  d{—hP,  —hP)  — >■  0  in  Theorem  4.4  holds  and,  combined  with  Theorem  4.2,  we  find  that 
for  a  fine  mesh  and  a  large  sample  size  the  resulting  exponential  epi-spline  estimator  is  “close”  to  the 
true  density,  even  if  that  density  is  outside  the  class  of  exponential  epi-splines. 

“Convergence”  in  the  Kullback-Leibler  divergence  is  closely  related  to  other  modes  of  convergence 
as  stated  next. 

4.5  Proposition  Suppose  that  densities  hP,hP  £  x-spF(m),  with  hP  = 

=  (Cesh>^ess):  and  rO  =  Then, 

dxLih^Wh^)  ^  0  ^  dKLnh^)  ^  0  ^  C,  ^  r^. 

Proof:  We  let  r”  =  and  =  (x[(,eshXess)'  The  implication  r"  — >■  dKiih^Wh'^)  0 

follows  directly  from  Proposition  3.1. 

To  show  that  dKL{h^\\h^)  0  — >■  rggg  we  observe  that  dKLfWP  >  0  and  for  any  two 

densities  f,g  on  [mo,mAr],  dxLifWg)  =  0  if  and  only  if  /(x)  =  g{x)  for  Lebesgue  almost  every  x  £ 
[mo,mAr].  We  therefore  consider  the  problem  miurgi? with  ii  =  {r  £  | 

/mcT  dx  =  1},  where  is  a  minimizer  and  in  fact  every  minimizer  must  coincide  with  rggg 

in  its  last  {p  +  1)N  components.  In  view  of  Proposition  3.1,  the  objective  function  in  this  problem 
is  linear  and  the  single  constraint  is  continuously  differentiable.  The  first-order  optimality  condition 
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for  this  problem  and  the  fact  that  {r  €  g  (cp,m(a;),r)^^  ^  jg  convex  imply  that  the 

hyperplane  VF  =  {r  G  =  0}  is  a  supporting  hyperplane  of  R  with 

being  the  only  (p  +  l)A^-dimensional  vector  rgss  that  can  be  augmented  by  a  /3  G  such  that 

{(/3,ress)}  =  Rn  W.  Since  £  R  and  for  sufficiently  large  n  is  arbitrarily  close  to  W,  we  reach  the 
desired  conclusion. 

We  realize  that  — )•  0  =i>  — >■  0  by  establishing  that  — >•  whenever 

dKL{h^\\h^)  — >■  0  using  a  similar  argument  as  above  and  then  use  Proposition  3.1. 

We  find  that  dKiih^Wh"')  0  dKL{h'^\\h^)  — >  0  by  invoking  that  dKL{h^\\h"')  0  — > 

Tggg  and  Proposition  3.1.  □ 

Asymptotic  normality  of  the  distribution  of  the  exponential  epi-spline  estimator  and  corresponding 
moments  may  also  hold  when  we  limit  the  scope  to  the  essential  epi-spline  parameters.  As  we  see 
from  the  discussion  before  Proposition  2.9,  one  cannot  expect  a  unique  estimator  —  a  prerequisite 
for  asymptotic  normality  —  unless  the  scope  is  limited  in  this  manner^.  This  focus  on  the  essential 
epi-spline  parameter  requires  additional  notation  that  we  introduce  next. 

For  any  rgss  £  ,  let^  H{ress)  :=  J’^J^)cgss(a:),  be  the  Hessian  of  the 

function  dx  at  rgss-  We  also  let  Sggs  be  the  variance-covariance  matrix  of  Cess(-A*^),  with 

distributed  by  the  true  density  and  S(ress)  :=  H{ress)~^^essH{'f’ess)~^ ■,  where  we  note  that 
if(ress)  is  nonsingular  by  the  argument  in  the  proof  of  Proposition  2.9.  For  notational  convenience, 
we  also  let  ’Eki'i^ess)  be  the  (p  +  1)  x  (p  -|-  1)  submatrix  of  S(rgss)  consisting  of  elements  in  columns 
{k  —  l){p  +  1)  -|-  1,  {k  —  l){p  +  1)  +  2,  ...,  {k  —  l){p  -|-  1)  +  (p  +  1)  and  the  corresponding  rows  in  the 
latter  matrix.  These  are  the  coefficients  corresponding  to  interval  (mfc_i,mfc).  Moreover,  let  ress,k  be 
the  subvector  of  components  Al-|- 1  -|-  (/c  —  l)(p-|- 1)  -|- 1,  ...,  Al-|- 1  -|-  (A:  —  l)(p-|- 1)  -|-  (p-|- 1)  of  rgss,  i.e.,  the 
parameters  that  define  the  epi-spline  in  m^),  and  the  corresponding  subvectors  of  Cgss  are  denoted 

by  Cess,k-  Finally,  we  let  :=  x^hP{x)dx  be  the  jth  moment  of  the  true  density  hP ,  A^(0,  S)  denote 
a  zero-mean  normal  vector  with  variance-covariance  matrix  S,  and  — convergence  in  distribution.  We 
are  now  ready  to  state  an  asymptotic  result  for  an  exponential  epi-spline  estimator,  where  we  make  the 
assumption  that  the  soft  information  is  “clearly”  correct,  i.e.,  the  true  density  corresponds  to  a  point 
in  the  interior  of  the  sets  RJ^  a.s.  for  sufficiently  large  n.  Moreover,  we  assume  that  the  true  density  is 
an  exponential  epi-spline.  Although  this  might  at  first  appear  restrictive.  Theorem  2.4  shows  that  such 
densities  can  approximate  to  an  arbitrary  level  of  accuracy  essentially  any  density. 

4.6  Theorem  (asymptotics).  Suppose  that  the  true  density  =  e~^°  G  x-spF(m),  with  m  = 
=  {cp^rn{') }  and  =  (?'mesh> ’’ess)  interior  of  the  (set)  inner  limit  of  the 

R^  a.s.  If  (Pp^rn)  P  obtained  by  independent  sampling  from  h?  and  is  a  sequence  of  optimal 

solutions  of  with  epi-spline  parameters  {r”  =  (j’mgsh) and  /i”  =  for  all  n, 

then  the  following  hold: 

(i) 

-  ress)  -^(0,  S(r°gg)) 

^One  could  appeal  to  more  sophisticated  central  limit  theorems,  such  as  those  in  [38],  but  additional  conditions  and 
machinery  is  required  and  would  require  us  to  stray  too  far  from  our  main  theme. 

II  We  use  }y,  y{  to  denote  the  outer  product  yy^  for  a  column  vector  y. 
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(a)  For  X  G  TTifc),  k  =  1,2, ...,  N 


n^/\h^{x)  -  h\x))  N  (O,  e-2(-e.,.(x),r-e.,.)  S,(rL)cess,fc(x)))  . 


(in) 


For  j  G  IN,  and  setting  w  =  x^Cess{x)e  the  moment  estimator  fx'j  = 

rmjv  ^  satisfies 

Jmo 


-  f])  -^(0,  {w,  S(r°,Jy;)). 


Proof:  The  law  of  large  number  gives  that  the  objective  function  in  (P^^)  converges  uniformly  on 
compact  sets  to  that  of  (Pp^m)  a.s.  We  recall  that  {cp^mi') ,  is  an  optimal  solution  of  (Pp  ^)  by 
assumption,  is  also  in  the  interior  of  the  (set)  inner  limit  of  the  a.s.  Consequently,  since  (Pp^m) 
does  not  involve  a  restriction  S^,  the  set  of  optimal  solutions  of  (P^m)  coincides  with  those  of  the 
relaxation  of  (Pp^m)  with  5”  replaced  by  e-spP(m)  for  sufficiently  large  n.  Let 


(pn 
V-*  ei 


1 

min  —  >  < 


Cess(^')5  *^ess)  H“ 


i=l 


rlUN 
J  mo 


3  (Cess(a:),?’ess)^2 


where  ,  X'^,  ...,  X”  is  the  sample  from  hP.  We  deduce  from  Propositions  2.8  and  2.9  that  (P^g)  and 
the  relaxation  of  (P^m)  have  unique  optimal  solutions  a.s.  and  that  they  are  equivalent  in  the  sense 
that  they  generate  the  same  essential  epi-spline  parameter.  Consequently,  for  sufficiently  large  re,  the 
optimal  solution  of  (Pg^g)  is  r”g  a.s. 

Let  X^  be  a  random  variable  with  density  hP  and 

rrriN 

(PL)-  min  P{(cegg(X°),regg)}+  / 

Jmo 

We  deduce  from  Propositions  2.8  and  2.9  that  an  optimal  solution  of  this  problem  is  unique  and  coincides 
with  the  essential  epi-spline  parameter  rggg  of  hP. 

Since  (Pg^g)  and  (Pg^g)  are  strictly  convex  and  unconstrained  a.s.,  their  unique  optimal  solutions 
are  equivalently  characterized  as  the  zeros  of  the  objective  function  gradients.  Since  these  gradients 
converge  uniformly  on  a.s.  by  the  law  of  large  numbers,  and  the  corresponding  Hessians  are 

identical  and  positive  dehnite,  item  (i)  follows  directly  from  Theorem  4  of  [51].  The  next  items  follow 
by  a  direct  application  of  a  Delta  Theorem;  see,  for  example,  §7.2.7  in  [62].  □ 

Although  Theorem  4.6  provides  rates  of  convergence,  it  excludes  the  possibility  of  soft  information 
in  influencing  the  estimates  for  large  samples  and,  in  addition,  deals  only  with  the  essential  epi-spline 
parameter.  We  end  the  section  by  examining  errors  for  a  finite  sample  size  under  relaxed  assumptions, 
which  leads  to  another  rate  of  convergence  result.  However,  the  treatment  requires  us  to  focus  on 
e-optimal  solutions  of  {rlxPp  „^),  now  viewed  as  a  random  optimization  problem,  which  for  any  e  >  0 
are  dehned  as 


77”  :=  <  r  G  P" 


"■  fmisr 

Y,{cp,m{X^)x)  <V^  +  e,  /  eP^^’-^^PPdx  <  1  L 

i=l  ) 
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where  the  optimal  value  of  {rlxP^  ,^)  is 


:= 


inf  —  {cnrn{X''),r)  such  that 

re-R"  n 


2=1 


rruN 
J  mo 


o-{Cp,m{x),r)  <  1. 


The  statements  below  deal  with  the  difference  between  the  true  density  and  := 
with  r”  €  for  e  >  0.  In  fact,  a  numerical  method  for  solving  a  realization  of  (r/xP”^)  generates 
an  element  of  P”,  and  consequently  also  an  estimate  /i”,  as  such  methods  utilize  finite  precision  and 
various  tolerances.  Also  let  plB  :=  {y  \  ||y||  <  p}  in  any  Euclidean  space,  :=  iiiaxi=o,i,...,p  |"l|^  and 
d{x,  S)  :=  infyg5  ||x  —  y\\  for  x  G  JR^,  S  C  IR^. 


4.7  Theorem  (hnite  sample  error).  Suppose  that  the  true  density  G  x-spE(m),  m  =  with 

epi-spline  parameter  r^,  and  {rlxPp  „^)  is  derived  by  independent  sampling  from  h? ,  has  a  nonempty 
feasible  set  a.s.,  and  R^  is  closed  and  convex  a.s.  For  any  a  >  0,  e  >  0,  p  >  max{— E"',  d(r°,  Pq)},  and 
some  /i”  =  r”  G 


>  K  and  dKL(h^llh^)  > 

with  probability  at  most  2{p  +  where 

ip 


pmisr 

/  Cp^m{x)h^  {x)dx 
J  mo 


K. 


K~[l  + 


Oi{p  +  ||r^||)\/(p-j-l)iV  +  ( 1  +  Ap^m^/  {p  +  2)  A  +  1 )  d(r^,  R^) 


Proof:  Let  be  a  random  variable  with  density  hP  and  ,X‘^, ...,  X'^  be  the  sample  that  generates 
{Pp^rn)-  We  denote  by  c^^miX^)  the  components  of  Cp^miX^),  j  =  1,2,...,  (p  +  2)N  +  1.  For  j  = 
1,  2, ...,  At  1,  cPp^miX^)  =  1  if  X^  =  mj-i  and  (Pp^rn{X^)  =  0  otherwise.  Consequently,  E{(Pp^rn{X^)}  =  0 
and,  likewise,  (1/n)  YPJi=i  Cp,m(X‘)  =  0  a.s.  For  j  =  N  +  1  +  (p  +  l)(k  —  1)  +  I  +  1,  I  =  0, 1,  ...,p, 
k  =  1,2,  ...,A,  Cp^rn(X^)  =  {X^—mk-iY  S  andcp,m(Ai°)  =  0  otherwise.  Consequently, 

for  J  =  A  +  2,  A  +  3, ...,  {p  +  2)N  +  1,  G  [0,  Ap^m]  a.s.  and  by  Hoeffding’s  inequality. 


P 


n 


for  every  a  >  0.  Moreover,  Boole’s  inequality  gives,  when  taking  advantage  of  the  zero  error  for 
j  =  1, ...,  A  +  1,  that 


P 


(p+2)Af+l 

u 


1=1 


n 


<  2{p  + 


Let  :  l?(P+2)^+i  ^  be  defined  by  :=  (l/n)  ')  + where  5^{r)  :=  0  if 

r  G  R^  and  ^  (5”'(r)  :=  oo  otherwise.  Let  ^  be  defined 

by  :=  P{(cp^m(X'^),  •)}  +  where  d^’”‘(r)  :=  0  if  ^  r  is  in  the  convex 

hull  of  R^  and  r^,  and  5^’^{r)  :=  oo  otherwise. 
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In  view  of  the  preceding  results  and  definitions,  for  r  —  G  pJB,  with  p  G  (0,  oo), 


(1/n)  r)  -  B{(c^,^(X^),r)} 

i=l 


<  a{p  +  |k°||)V {p  +  1)A^ 


with  at  least  probability  1  —  2{p  +  l)A^e  2n(o/Ap,,„)^^  Using  this  fact,  Example  7.62  of  [56]  gives  that 
with  the  same  probability, 

dp  (99”,  <  a{p  +  ||r*^||)  v^(p"+TjiV  +  ^1  +  {p  +  2)A^  +  1^  d{r^,  T?""), 

where  is  closely  related  to  dp]  see  §7.1  in  [56].  Then,  from  Theorem  7.69  in  [56],  we  deduce  the  first 
result  after  realizing  that  is  an  e-optimal  solution  of  min(/?®’”',  where  the  additional  factor  1  +  4p/e 
arises  from  that  theorem.  Proposition  3.1  yields  the  second  conclusion.  □ 

Theorem  4.7  shows  that  there  are  two  sources  of  error  in  the  estimation  process  corresponding  to 
the  two  parts  of  K.  The  first  source  is  sampling  error,  represented  by  the  term  involving  ce,  which  can 
be  made  small  by  selecting  a  small  a  and  this  error  is  only  exceeded  with  a  small  probability  if  na^ 
is  large.  The  second  source  is  caused  by  d{r^,R^),  the  distance  between  the  true  epi-spline  parameter 
and  the  constraint  set  R^.  Of  course,  if  only  appropriate  soft  information  is  included,  then  G 
and  d{r^,R^)  =  0.  Otherwise,  incorrect  specihcation  of  soft  information  induces  a  “bias”  in  the  density 
estimator.  We  note  that  Theorem  4.7  provides  additional  support  for  considering  {rlxPp  „^)  also  for 
instances  which  are  not  loosely  constrained.  Even  in  such  cases,  {rlxPp  „^)  is  guaranteed  to  generate  a 
density  near  the  true  density. 

We  recall  the  notion  of  “bounded  in  probability.”  For  a  sequence  of  random  variables  we 

write  y”  =  Op(l)  when  for  any  C  >  Oj  there  exists  a  /3  >  0  such  that  Prob(|y"'|  >  /3)  <  C  for  all  n. 


4.8  Corollary  For  sufRciently  large  n,  suppose  that  the  assumptions  of  Theorem  4. 7  hold  and  d{r^,  R^) 
=  0  a.s.  Then, 

n^^^dKiih^WK)  =  Op(l)  for  some  r"  G  77^ 

Proof:  Theorem  4.7  and  the  fact  that  d{r^,R^)  =  0  imply  that  for  sufficiently  large  n 

Froh{n}/‘^dKL{h^\K)  >  K'an^/'^)  <  2{p  + 


where  K'  =  Cp^rn{x)h^ {x)dx  (1  +  4/9/e)(/9  +  Ur'^ID-y/ {p  +  1)N.  We  let  C  >  0  and  couple  a  and 

n  such  that  (  =  2{p  +  jp  _  —  p  log(C/2(p  +  l)A)/(2a^).  Consequently, 

Prob(n^/^(i;^i(/i°||/i”)  >  /3)  <  Ci  where  fd  :=  i7'(— p log(C/2(p  +  l)A)/2)^/^  and  the  conclusion 
follows.  □ 

In  view  of  the  preceding  result,  we  see  that  the  canonical  rate  of  is  obtained  for  the  exponential 
epi-spline  estimator  even  if  soft  information  is  “active.” 


5  Numerical  Examples 

We  illustrate  the  exponential  epi-spline  estimator  through  a  series  of  examples  using  a  freely  available 
Matlab  toolbox  [59]  that  relies  on  the  fmincon  solver  (Matlab  7.10.0);  see  also  [8]  for  a  corresponding 
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R  toolbox.  The  focus  is  on  showing  the  effect  of  including  various  sources  of  soft  information  in  the 
context  of  small  sample  sizes.  §5.1  shows  estimates  of  an  exponential  density  using  10  observation 
and  an  increasing  collection  of  soft  information.  §5.2  provides  an  alternative  to  the  Bayesian  paradigm 
and  demonstrates  how  a  diverse  family  of  densities  can  be  generated.  §5.3  examines  the  probability 
density  of  customer  time-in-service  for  a  modified  M/M/1  queue.  §5.4  shows  the  effect  of  incorrect 
soft  information.  The  section  ends  with  §5.5,  where  soft  information  about  moments  is  examined  for 
increasing  sample  sizes.  It  is  beyond  the  scope  of  the  paper  to  include  a  comprehensive  comparison 
with  alternative  density  estimators,  which  in  any  case  have  difficulties  with  incorporating  an  arbitrary 
set  of  soft  information.  Occasionally,  we  simply  contrast  with  kernel  estimates  using  “ksdensity”  in 
Matlab,  a  Gaussian  kernel,  and  default  bandwidths,  which  are  optimized  in  some  sense  for  the  normal 
densities.  These  estimates  can  possibly  be  improved  with  better  bandwidth  and  kernel  choices.  In  all 
cases,  we  use  epi-splines  of  order  2  and  if  there  is  no  soft  information  about  support  bounds,  we  set 
mo  (mAf)  to  two  sample-estimated  standard  errors  below  (above)  the  smallest  (largest)  sample  point, 
and  use  uniform  meshes.  The  set  always  includes  the  loose  constraints  —1000  <  r  <  1000.  The 
Gauss-Legendre  quadrature  rule  with  20  points  evaluates  the  integrals  over  each  segment  (mfc-i,mfc) 
with  high  accuracy.  We  often  assess  the  quality  of  an  estimate  of  a  density  by  the  mean-square 
error  (MSE)  —  h^{x))‘^h^{x)dx.  For  additional  numerical  results  we  refer  to  [66,  58,  65,  57]. 

5.1  Value  of  Soft  Information 

We  illustrate  the  effect  of  soft  information  in  a  simple  example.  For  a  true  exponential  density  with 
parameter  A  =  1  (dotted  black  curves  in  Figure  2)  and  a  sample  of  size  10  (see  green  stems),  Figure  2 
shows  our  exponential  epi-spline  estimates  (solid  red  curves)  under  two  classes  of  soft  information:  (a) 
continuously  differentiable,  nonnegatively  supported,  and  log-concave  density  and  (b)  also  nonincreasing 
density  and  a  relative  bound  on  the  slope.  The  soft  information  about  relative  slope  amounts  to  letting 
the  quantity  h"'' (x) / (x)  be  in  the  interval  [—1,0].  We  observe  that  the  exponential  density  with 
parameter  A  =  1  has  hP’{x)/hP{x)  =  —1  for  all  x  >  0. 

For  comparison,  a  kernel  estimate,  incorporating  information  about  a  nonnegative  support,  is  dis¬ 
played  by  dashed  black  curves.  The  exponential  epi-spline  estimates  are  obtained  using  a  mesh  with 
N  =  10.  In  Figure  2(a),  MSE  is  0.1144  and  0.3273  for  exponential  epi-spline  and  kernel  estimates,  re¬ 
spectively.  The  kernel  estimate  reaches  well  above  4.5  near  zero,  though  the  plots  are  truncated  for  the 
sake  of  clarity.  Figure  2(b)  shows  the  visually  improved  exponential  epi-spline  estimate  with  a  reduced 
MSE  of  0.0416.  The  exponential  epi-spline  estimates  miss  the  density  peak  at  zero,  but  the  present 
sample  provides  few  indications  about  such  a  peak  and  its  capture  will  naturally  be  difficult.  Still, 
the  exponential  epi-spline  estimate  is  both  qualitatively  and  quantitatively  close  to  the  true  density 
elsewhere.  The  ability  to  incorporate  various  kinds  of  soft  information  along  the  lines  illustrated  here 
offers  the  analyst  a  valuable  tool  for  exploring  assumptions  and  their  consequences.  One  can  attempt  to 
improve  the  kernel  estimate  using  various  bandwidth  as  well  as  truncation  (see  for  example  [68,  p.l9]). 
The  effect  on  bandwidth  in  the  kernel  estimate  is  illustrated  in  Figure  3(a),  where  the  case  with  default 
bandwidth  (given  in  Figure  2(a))  is  supplemented  by  estimates  using  bandwidth  0.15,  0.2625,  0.375, 
0.4875,  and  0.6.  The  combination  of  a  nonnegative  support  and  few  data  points  make  it  nontrivial 
to  select  an  appropriate  bandwidth  and  the  estimates  remain  mostly  qualitatively  similar.  In  Figure 
3(b)  we  remove  the  requirement  of  a  nonnegative  support.  Again,  the  choice  of  bandwidth  appears 
challenging.  However,  truncation  and  renormalization  of  the  portion  of  the  density  estimates  to  the  left 


25 


of  the  origin  improves  the  situation;  see  the  dashed  blue  lines  in  Figures  2(a),  2(b),  and  3(a)  that  give 
the  resulting  estimate  when  truncating  the  (default)  density  estimate  in  Figure  3(b)  (black  line). 

5.2  Kullback-Leibler  Divergence  and  the  Bayesian  Paradigm 

Our  framework  provides  an  alternative  to  traditional  Bayesian  updating.  In  addition  to  the  inclusion  of 
numerous  types  of  soft  information — which  can  be  viewed  as  “prior”  information — we  may  also  directly 
restrict  (P^.p)  to  a  neighborhood  of  a  reference  density  using  (4).  To  illustrate  the  framework, 
consider  a  reference  (prior)  density  that  is  standard  normal  and  a  sample  consisting  of  10  points  from 
the  same  density;  see  Figure  4.  We  set  =  10  and  restrict  the  search  to  continuously  differentiable 
densities.  If  no  emphasis  is  placed  on  the  reference  density,  i.e.,  (/?(10)  =  oo  in  (4),  then  we  obtain 
the  exponential  epi-spline  estimate  marked  with  the  red  dotted  line  in  Figure  4.  As  proximity  to  the 
reference  density  is  enforced  more  vigorously  by  setting  (/j(10)  =  1,  0.1,  and  0.01,  we  obtain  the  dashdot, 
dashed,  and  solid  lines,  respectively,  in  Figure  4.  The  Kullback-Leibler  divergence  constraints  dampen 
the  oscillations  caused  by  the  sample  by  a  degree  determined  by  (/?(10),  which  in  practice  should  be 
selected  based  on  the  confidence  in  the  correctness  of  the  reference  density. 

A  related  situation  arises  when  an  analyst  would  like  to  generate  multiple  densities  that  span  a 
range  of  possibilities,  for  example  to  account  in  some  manner  for  questionable  soft  information.  For 
example,  when  the  estimated  density  is  to  be  used  as  input  in  further  simulation  and  optimization, 
it  may  be  prudent  to  consider  a  set  of  densities  and  possibly  let  planning  be  based  on  the  worst 
density  in  some  sense.  We  illustrate  this  situation  by  returning  to  the  exponential  example  of  §5.1. 
Suppose  that  the  second  density  generated  there  (see  Figure  2(b))  is  considered  plausible,  but  we 
would  like  to  also  generate  relevant  alternatives.  Retaining  a  restriction  to  continuously  differentiable, 
nonincreasing,  and  nonnegatively  supported  densities,  we  construct  three  alternatives  by  imposing  (4) 
with  <  replaced  by  >  and  right-hand  side  0.1,0.01,  and  0.001,  and  being  the  original  estimate 
in  Figure  2(b).  Consequently,  we  determine  densities  that  are  at  least  certain  “distances”  away  from 
the  original  estimate  in  the  sense  of  Kullback-Leibler  divergence,  while  still  maximizing  the  likelihood 
function  of  the  sample.  Figure  5  shows  the  results  with  the  solid  red  line  and  dotted  black  line  showing 
the  original  estimate  and  true  density  as  in  Figure  2(b).  The  alternative  densities  are  depicted  with 
dashed,  dot-dashed,  and  dotted  red  lines  for  right-hand  sides  of  0.001,0.01,  and  0.1,  respectively.  We 
observe  that  even  though  based  on  only  10  sample  points,  the  original  together  with  the  alternative 
densities  provide  a  “diversified”  set  of  densities  near  the  true  density  well  suited  as  input  for  further 
studies. 

5.3  Estimation  of  Queueing  Model  Output 

Significant  challenges  arise  when  the  density  to  be  estimated  is  discontinuous.  We  illustrate  this  situ¬ 
ation  here  by  an  example  taken  from  [65];  see  [57]  for  additional  examples.  Suppose  that  the  random 
variable  of  interest  is  the  customer  time- in-service  of  a  modihed  M/M/1  queue  with  arrival  rate  A  =  1 
and  service  rate  //  =  1.5,  but  where  50%  of  customers  who  enter  the  system  are  held  at  a  separate 
station  for  two  time  units.  Obviously,  the  true  density  is  an  equal  mixture  of  the  probability  density 
of  the  customer  time-in-service  without  a  separate  station  (an  exponential  density)  and  the  same  den¬ 
sity  shifted  to  the  right  by  two  time  units,  yielding  a  discontinuous  density.  Using  a  sample  of  size 
n  =  100,  we  aim  to  recover  this  density  using  a  lower  semicontinuous  exponential  epi-spline  estimate 
with  N  =  10. 
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(a) 


(b) 

Figure  2:  Exponential  example:  (a)  continuously  differentiable,  nonnegative  support,  and  log-concave, 
(b)  also  nonincreasing  and  relative  bounds  on  slope. 
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Figure  3:  Exponential  example:  Kernel  estimates  using  varying  bandwidth  and  (a)  nonnegative  support 
and  (b)  unbounded  support. 
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density  density 


Figure  4:  Normal  Example:  Kullback-Leibler  divergence  constraint. 


Figure  5:  Exponential  Example:  Diversification  through  Kullback-Leibler  divergence. 
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Table  1  shows  aggregated  results  across  100  meta-replications  for  a  variety  of  soft  information.  The 
first  row  of  results  shows  MSE  under  no  additional  information  beyond  lower  semicontinuity  and  bounds 
on  the  second-order  derivatives.  The  second  row  corresponds  to  a  restriction  of  the  slope  to  be  in  the 
interval  [—4, 0]  and  the  third  row  assumes  a  nonnegative  support.  The  last  row  incorporates  bounds  on 
the  slope,  nonnegative  support,  and  log-concavity  of  the  upper  tail.  We  show  that  the  average  MSE 
(second  column)  decreases  with  increasing  soft  information  and  mostly  also  the  standard  deviation  of 
the  MSE  (third  column). 


Information 

Average  MSE 

Standard  Deviation 

no  additional  info. 

0.0045 

0.0020 

slope 

0.0040 

0.0016 

lower  support  bound  (lb) 

0.0040 

0.0021 

slope,  lb,  and  tail 

0.0030 

0.0017 

Table  1:  MSE  of  customer  time- in-service  for  queueing  model  with  various  levels  of  soft  information. 

Figure  6  shows  an  instance  corresponding  to  the  last  row  in  Table  1.  The  MSE  of  the  exponential 
epi-spline  (red  line)  estimate  is  0.0016.  The  exponential  epi-spline  estimate  captures  the  essence  of  the 
true  density  (dotted  line)  rather  well. 

5.4  Incorrect  Soft  Information 

As  given  by  Theorem  4.2,  optimal  solutions  of  (E^m)  tend  to  a  point  in  the  Kullback-Leibler  projection 
of  the  true  density  relative  to  the  set  constructed  by  the  soft  information  as  the  sample  size  grows. 
Consequently,  in  the  presence  of  incorrect  soft  information  that  excludes  h^,  we  achieve  the  density 
“nearest”  to  within  the  set  of  densities  satisfying  the  (incorrect)  soft  information.  We  illustrate  this 
situation  by  considering  a  standard  normal  density  and  its  exponential  epi-splines  estimates  based  on 
N  =  10.  We  adopt  soft  information  about  continuous  differentiability  and  log-concavity.  In  addition, 
we  impose  the  incorrect  constraint  that  the  expected  value  must  be  no  larger  than  —0.5.  Figure  7(a) 
shows  the  resulting  exponential  epi-spline  estimate  (solid  red  line)  and  the  kernel  estimate  (dashed 
black  line)  for  n  =  100.  Figure  7(b)  displays  the  corresponding  results  for  n  =  1000.  We  observe  that 
while  the  kernel  estimator  benefits  from  the  larger  sample  size  and  obtains  a  nearly  perfect  estimate 
for  n  =  1000,  the  unfortunate  expectation  constraint  on  the  exponential  epi-spline  prevents  it  from 
approaching  the  true  density.  However,  we  obtain  a  “normal-looking”  density  with  a  shifted  mean  of 
-0.5. 

5.5  Moment  Information 

We  end  the  section  by  presenting  a  summary  of  results  over  a  range  of  sample  sizes  for  a  normal  density 
with  zero  mean  and  standard  deviation  of  two.  We  carry  out  104  meta-replications  and  compute 
average  and  standard  deviation  of  the  resulting  MSE  for  both  an  exponential  epi-spline  estimate  and 
a  kernel  estimate.  We  use  N  =  20  and  soft  information  that  amounts  to  continuous  differentiability, 
log-concavity,  and  bounds  on  first  and  second  moments  that  ensure  estimates  with  moments  within 
20%  of  their  correct  values. 
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Figure  6:  Customer  time-in-service  density. 
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(a) 


X 


(b) 

Figure  7:  Normal  Example:  Estimates  for  n  =  100  (a)  and  n  =  1000  (b)  with  incorrect  constraint 
xh^{x)dx  <  —0.5. 
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Figure  8:  Average  Normal  Example:  Average  (a)  and  standard  deviation  (b)  of  MSE  for  exponential 
epi-spline  and  kernel  estimators  for  a  range  of  sample  sizes. 
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Figure  8  gives  the  corresponding  average  and  standard  deviation  of  the  MSE  for  a  range  of  sample 
sizes.  We  see  that  the  exponential  epi-splines  estimates  result  in  smaller  MSE,  on  average.  However, 
the  advantage  decreases  as  the  sample  size  grows  as  expected. 

6  Conclusions 

We  have  developed  a  constrained  maximum  likelihood  estimator  that  incorporates  any  soft  information 
that  might  be  available  and  therefore  offers  substantial  flexibility  for  practitioners.  In  particular  in 
situations  with  few  (hard)  observations,  soft  information  can  be  brought  in  and  reasonable  estimates 
can  be  achieved  with  as  little  as  10  sample  points.  In  simple  but  illustrative  examples  of  estimating 
exponential,  normal,  and  mixture  of  exponential  distributions,  we  construct  new  estimates  under  a 
variety  of  soft  information  not  commonly  considered.  The  estimator  requires  the  solution  of  an  infinite¬ 
dimensional  optimization  problem,  which  we  carry  out  approximately  utilizing  exponential  epi-splines. 
The  justification  stems  from  the  fact  that  exponential  epi-splines  can  approximate  to  an  arbitrary  level  of 
accuracy  practically  any  density.  We  show  that  optimization  over  exponential  epi-splines  often  reduces 
to  convex  programming.  Our  theoretical  development  establishes  consistency,  asymptotic  normality, 
and  finite  sample  error  of  order  under  the  assumption  that  the  true  density  is  an  exponential 

epi-spline. 


34 


References 

[1]  H.  Attouch,  R.  Lucchetti,  and  R.  Wets.  The  topology  of  the  p-Hausdorff  distance.  Annali  di 
Matematica  pura  ed  applicata,  CLX:303-320,  1991. 

[2]  F.  Balabdaoni,  K.  Rufiback,  and  J.  A.  Wellner.  Limit  distribution  theory  for  maximum  likelihood 
estimation  of  a  log-concave  density.  Annals  of  Statistics,  37:1299-1331,  2009. 

[3]  F.  Balabdaoni  and  J.  A.  Wellner.  Estimation  of  a  k-monotone  density:  limit  distribution  theory 
and  the  spline  connection.  Annals  of  Statistics,  35(6),  2007. 

[4]  F.  Balabdaoni  and  J.  A.  Wellner.  Estimation  of  a  k-monotone  density:  characterizations,  consis¬ 
tency  and  minimax  lower  bounds.  Statistica  Neerlandica,  64(1),  2010. 

[5]  R.R.  Barton,  B.L.  Nelson,  and  W.  Xie.  A  framework  for  input  uncertainty  analysis.  In  Proceedings 
of  the  2002  Winter  Simulation  Conference,  2010. 

[6]  M.  Birke.  Shape  constrained  kernel  density  estimation.  Journal  of  Statistieal  Planning  and  Infer- 
enee,  139:2851-2862,  2009. 

[7]  P.  Biihlmann  and  S.  van  de  Geer.  Statistics  for  High- Dimensional  Data,  Methods,  Theory  and 
Applieations.  Springer,  2011. 

[8]  S.  Buttrey,  J.O.  Royset,  and  R.  Wets.  XSPL  estimator:  An  R  toolbox, 

http:/ /faculty.nps.edu/joroyset/XSPL.html,  2014. 

[9]  R.  J.  Carroll,  A.  Delaigle,  and  P.  Hall.  Testing  and  estimating  shape-constrained  nonparametric 
density  and  regression  in  the  presence  of  measurement  error.  Journal  of  the  American  Statistical 
Association,  106(493):191-202,  2011. 

[10]  X.  Chen.  Large  sample  sieve  estimation  of  semi-nonparametric  models.  In  Handbook  of  Economet¬ 
ric,  pages  5549-5632.  2007.  Volume  6B,  Chapter  76. 

[11]  S.  E.  Chick.  Input  distribution  selection  for  simulation  experiments:  Accounting  for  input  uncer¬ 
tainty.  Operations  Research,  49:744-758,  2001. 

[12]  M.  Cule,  R.J.  Samworth,  and  M.  Stewart.  Maximum  likelihood  estimation  of  a  multi-dimensional 
log-concave  density.  Journal  of  the  Royal  Statistical  Society  Series  B,  72:545-600,  2010. 

[13]  M.  Cule,  R.J.  Samworth,  and  M.  Stewart.  Rejoinder  to  maximum  likelihood  estimation  of  a  multi¬ 
dimensional  log-concave  density.  Journal  of  the  Royal  Statistical  Society  Series  B,  72:600-607, 
2010. 

[14]  M.  L.  Cule  and  R.  J.  Samworth.  Theoretical  properties  of  the  log-concave  maximum  likelihood 
estimator  of  a  multidimensional  density.  Electronic  J.  Statistics,  4:254-270,  2010. 

[15]  P.  Davies  and  A.  Kovac.  Densities,  spectral  densities  and  modality.  The  Annals  of  Statistics, 
32:1093-1136,  2004. 


35 


[16]  G.  M.  de  Montricher,  R.A.  Tapia,  and  J.R.  Thompson.  Nonparametric  maximum  likelihood  es¬ 
timation  of  probability  densities  by  penalty  function  method.  Annals  of  Statistics,  3:1329-1348, 
1975. 

[17]  L.  Dechevsky  and  S.  Penev.  On  shape-preserving  probabilistic  wavelet  approximators.  Stochastic 
Analysis  and  Applications,  15:187-215,  1997. 

[18]  M.  Delecroix  and  C.  Thomas-Agnan.  Spline  and  kernel  regression  under  shape  restrictions.  In 
M.  G.  Schimek,  editor,  Smoothing  and  Regression,  pages  109-134.  Wiley,  2000. 

[19]  R.A.  DeVore.  Monotone  approximation  by  polynomials.  SIAM  Journal  on  Mathematical  Analysis, 
8:906-921,  1977. 

[20]  R.A.  DeVore.  Monotone  approximation  by  splines.  SIAM  Journal  on  Mathematical  Analysis, 
8:891-905,  1977. 

[21]  M.  X.  Dong  and  R.  J-B  Wets.  Estimating  density  functions:  a  constrained  maximum  likelihood 
approach.  Journal  of  Nonparametric  Statistics,  12(4):549-595,  2007. 

[22]  G.R.  Doss.  Shape- Constrained  Inference  for  Concave- Transformed  Densities  and  their  Modes.  Phd 
dissertation.  University  of  Washington,  2013. 

[23]  L.  Dumbgen  and  K.  Ruhbach.  Maximum  likelihood  estimation  of  a  log-concave  density  and  its 
distribution  function:  Basic  properties  and  uniform  consistency.  Bernoulli,  15(l):40-68,  2009. 

[24]  L.  Dumbgen,  R.  J.  Samworth,  and  D.  Schuhmacher.  Approximation  by  log-concave  distributions 
with  applications  to  regression.  Annals  of  Statistics,  39:702-730,  2011. 

[25]  P.B.  Eggermont  and  V.N.  LaRiccia.  Maximum  Penalized  Likelihood  Estimation,  Volume  I:  Density 
Estimation.  Springer,  2001. 

[26]  Y.  Feng,  D.  Gade,  S.  Ryan,  J.-P.  Watson,  R.  Wets,  and  D.  Woodruff.  A  new  approximation  method 
for  generating  day-ahead  load  scenarios.  In  2013  IEEE  Power  &  Energy  Society  Ceneral  Meeting. 
IEEE,  2013. 

[27]  M.  Freimer  and  L.  W.  Schruben.  Collecting  data  and  estimating  parameters  for  input  distributions. 
In  Proceedings  of  the  2002  Winter  Simulation  Conference,  2002. 

[28]  F.  Gao  and  J.  A.  Wellner.  On  the  rate  of  convergence  of  the  maximum  likelihood  estimator  of  a 
k-monotone  density.  Science  in  China  Series  A:  Mathematics,  52(7),  2009. 

[29]  S.  Geman  and  C.-R.  Hwang.  Nonparametric  maximum  likelihood  estimation  by  the  method  of 
sieves.  The  Annals  of  Statistics,  10(2):401-414,  1982. 

[30]  I.  J.  Good  and  R.  A.  Gaskin.  Nonparametric  roughness  penalties  for  probability  densities. 
Biometrika,  58:255-277,  1971. 

[31]  U.  Grenander.  On  the  theory  of  mortality  measurement.  I.  Skandinavisk  Aktuarietidskrift,  39:70- 
96,  1956. 


36 


[32]  U.  Grenander.  On  the  theory  of  mortality  measurement.  II.  Skandinavisk  Aktuarietidskrift,  39:125- 
153,  1956. 

[33]  U.  Grenander.  Abstract  Inference.  Wiley,  1981. 

[34]  P.  Groenenboom,  G.  Jongbloed,  and  J.A.  Wellner.  Estimation  of  a  convex  function:  characteriza¬ 
tions  and  asymptotic  theory.  Annals  of  Statistics,  29,  2001. 

[35]  P.  Groenenboom  and  J.A.  Wellner.  Information  bounds  and  nonparametric  maximum  likelihood 
estimation.  Birkhauser,  Basel,  1992. 

[36]  P.  Hall  and  K.-H.  Kang.  Unimodal  kernel  density  estimation  by  data  sharpening.  Statistica  Sinica, 
15:73-98,  2005. 

[37]  G.  Jongbloed.  The  iterative  convex  minorant  algorithm  for  nonparametric  estimation.  Journal  of 
Computational  and  Graphical  Statistics,  7:310-321,  1998. 

[38]  A.J.  King  and  R.T.  Rockafellar.  Asymptotic  theory  for  solution  of  genaralized  M-estimation  and 
stochastic  programming.  Technical  Report  WP-90-76,  International  Institute  for  Applied  Systems 
Analysis,  Laxenburg,  Austria,  1990. 

[39]  V.  K.  Klonias.  Gonsistency  of  two  nonparametric  maximum  penalized  likelihood  estimators  of  the 
probability  density  function.  Annals  of  Statistics,  10:811-824,  1982. 

[40]  R.  Koenker  and  I.  Mizera.  Density  estimation  by  total  variation  regularization.  In  A  Festschrift 
for  Kjell  Doksum.  World  Scientific,  Singapore,  2006. 

[41]  R.  Koenker  and  I.  Mizera.  Primal  and  dual  formulations  relevant  for  the  numerical  estimation 
of  a  density  function  via  regularization.  In  A.  Pazman,  J.  Volaufova,  and  V.  Witkovsky,  editors. 
Proceedings  of  the  Conference  ProbStat  ’06,  volume  38.  Tatra  Mountain  Mathematical  Publications, 
2008. 

[42]  R.  Koenker  and  I  Mizera.  Quasi-concave  density  estimation.  Annals  of  Statistics,  38:2998-3027, 

2010. 

[43]  T.  Leonard.  Density  estimation,  stochastic  processes  and  prior  information.  Journal  of  the  Royal 
Statistical  Society,  B40:113-146,  1978. 

[44]  E.  Lim  and  P.  W.  Glynn.  Gonsistency  of  multidimensional  convex  regression.  Operations  Research, 
60:196-208,  2012. 

[45]  M.  Meyer.  Gonstrained  penalized  splines.  Canadian  Journal  of  Satistics,  40:190-206,  2012. 

[46]  M.  Meyer.  Nonparametric  estimation  of  a  smooth  density  with  shape  restrictions.  Statistica  Sinica, 
22:681-701,  2012. 

[47]  M.  Meyer  and  D.  Habtzghib.  Nonparametric  estimation  of  density  and  hazard  rate  functions  with 
shape  restrictions.  Journal  of  Nonparametric  Statistics,  23(2):455-470,  2011. 


37 


[48]  J.  Kumar  Pal,  M.  Woodroofe,  and  M.  Meyer.  Estimating  a  polya  frequency  function.  In  R.  Liu, 
W.  Strawderman,  and  C.-H.  Zhang,  editors.  Complex  datasets  and  inverse  problems,  pages  239- 
249.  Beachwood,  OH,  2007.  IMS  Lecture  Notes  Monogr.  Ser.,  volume  54. 

[49]  D.  Papp.  Estimation  problems  involving  nonnegative  polynomials  and  their  restrictions.  PhD 
dissertation,  Rutgers  University,  2011. 

[50]  D.  Papp  and  F.  Alizadeh.  Shape  constrained  estimations  using  nonnegative  splines.  Journal  of 
Computational  and  Craphical  Statistics,  23(1):211-231,  2014. 

[51]  R.  Pasupathy.  On  choosing  parameters  in  retrospective-approximation  algorithms  for  stochastic 
root  finding  and  simulation  optimization.  Operations  Research,  58:889-901,  2010. 

[52]  G.  H.  Pflug  and  R.  J-B  Wets.  Shape  restricted  nonparametric  regression  with  overall  noisy  mea¬ 
surements.  Journal  of  Nonparametric  Statistics,  25:323-338,  2013. 

[53]  J.S.  Racine.  Computational  tools  for  nonparametric  estimation, 

https:/ /www. economics. mcmaster.ca/people/racinej,  2015. 

[54]  L.  Reboul.  Estimation  of  a  function  under  shape  restrictions,  applications  to  reliability.  The  Annals 
of  Statistics,  33:1330-1356,  2005. 

[55]  1.  Rios,  R.  Wets,  and  D.  Woodruff.  Multi-period  forecasting  and  scenarios  generation  with  limited 
data.  Computational  Management  Science,  in  review. 

[56]  R.  T.  Rockafellar  and  R.  J-B.  Wets.  Variational  analysis.  Springer,  New  York,  NY,  1998. 

[57]  J.  O.  Royset  and  R.  J-B  Wets.  On  function  identification  problems.  Naval  Postgraduate  School, 
Monterey,  California,  2014. 

[58]  J.O.  Royset,  N.  Sukumar,  and  R.  J-B  Wets.  Uncertainty  quantification  using  exponential  epi- 
splines.  In  Proceedings  of  the  International  Conference  on  Structural  Safety  and  Reliability,  2013. 

[59]  J.O.  Royset  and  R.  Wets.  XSPL  estimator  in  matlab.  http://faculty.nps.edu/joroyset/XSPL.html, 
2013. 

[60]  K.  Rufiback.  Computing  maximum  likelihood  estimators  of  a  log-concave  density  function.  Journal 
of  Statistical  Computation  and  Simulation,  77:561-574,  2007. 

[61]  A.  Seregin  and  J.  A.  Wellner.  Nonparametric  estimation  of  multivariate  convex-transformed  den¬ 
sities.  Annals  of  Statistics,  38(6):3751-3781,  2010. 

[62]  A.  Shapiro,  D.  Dentcheva,  and  A.  Ruszczynski.  Lectures  on  Stochastic  Programming:  Modeling 
and  Theory.  SIAM,  Philadelpha,  PA,  2009. 

[63]  M.  Silvapulle  and  P.  Sen.  Constrained  Statistical  Inference.  Wiley  Series  in  Probability  and 
Statistics.  Wiley,  New  York,  NY,  2005. 

[64]  B.  W.  Silverman.  On  the  estimation  of  a  probability  density  function  by  the  maximum  penalized 
likelihood  method.  Annals  of  Statistics,  10:795-810,  1982. 


38 


[65]  D.  Singham,  J.O.  Royset,  and  R.  J-B  Wets.  Density  estimation  of  simulation  output  using  expo¬ 
nential  epi-splines.  In  Proceedings  of  the  2013  Winter  Simulation  Conference,  2013. 

[66]  R.  Sood  and  R.  Wets.  Information  fusion,  http://www.math.ucdavis.edu/~prop01,  2011. 

[67]  J.  R.  Thompson  and  R.  A.  Tapia.  Nonparametric  Function  Estimation,  Modeling,  and  Simulation. 
SIAM  Publishers,  Philadelphia,  PA,  1990. 

[68]  A.B.  Tsybakov.  Introduction  to  Nonparametric  Estimation.  Springer,  2009. 

[69]  B.  A.  Turlach.  Shape  constrained  smoothing  using  smoothing  splines.  Computational  Statistcs, 
20(1):81-103,  2005. 

[70]  G.  Wahba.  Spline  Models  for  Observational  Data.  SIAM,  1990. 

[71]  G.  Walther.  Detecting  the  presence  of  mixing  with  multiscale  maximum  likelihood.  Journal  of  the 
American  Statistical  Association,  97:508-513,  2002. 

[72]  G.  Walther.  Inference  and  modeling  with  log-concave  distributions.  Statistical  Science,  24(3) :319- 
327,  2009. 

[73]  J.  Wang.  Asymptotics  of  least-squares  estimators  for  constrained  nonlinear  regression.  Annals  of 
Statistics,  24(3):  1316-1326,  1996. 

[74]  R.  Wets  and  I.  Rios.  Modeling  and  estimating  commodity  prices:  copper  prices.  Mathematics  of 
Einance  and  Economics,  in  review. 

[75]  M.A.  Wolters.  A  greedy  algorithm  for  unimodal  kernel  density  estimation  by  data  sharpening. 
Journal  of  Statistical  Software,  47(6):l-26,  2012. 


39 


